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(57) Abstract 

Compositions and methods for 
the detection and therapy of cancer arc 
disclosed. The compounds provided 
include human endogenous retroviral 
sequences that are preferentially expressed 
in mmor tissue, as well as polypeptides 
encoded by such nucleotide sequences. 
Vaccines and pharmaceutical compositions 
comprising such compounds are also 
provided and may be used, for example, 
for the prevention and treatment of cancer. 
The polypeptides may also be used for 
the production of antibodies, which are 
useful for diagnosing and monitoring the 
progressicH) of cancer in a patient. 
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TTA GAG ACC CAA TTC GGA CCT AAT TGC CAC CCA AAT TTC TCA ACT CGA 
Leu Glu Thr Gin Leu Gly Pro Asn Trp Asp Pro Asn Phe Spt Ser Gly 
1 5 ' 16 15 

GOG AGA ACT m GAC GAT TTC CAC CGG TAT CTC CTC GTG GGT ATT CAG 
Gly Arg Thr Phe Asp Asp Phe His Arj Tyr Leu Leu Vol G^^ lie Gin 

GGA GCT GCC CAG AAA CCT ATA AAC TTG TCT AAG GCG ATT GAA GTC GTC 
Gly Ala Ala Gin Lys Pro lie Asn Leu Ser Lys Alo He Glu Val Vol 
35 40 45 

CAG 0^ CAT GAT GAG TCA CCA GGA GTG TTT TTA GAG CAC CTC CAG GAG 198 



% 



144 



Gin Gly His Asp Glu Ser Pro Gly Vol Phe Leu Glu His Leu Gin Glu 
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60 



GCT TAT CGG ATT TAC ACC CCT TTT GAC CTG GCA GCC CCC GAA AAT AGC 240 
Alo Tyr Arg lie Tyr Thr Pro Phe Asp Leu Alo Alo Pro Glu Asn Ser 
65 70 75 80 

WT GCT CTT AAT TTG CCA TTT GTG GCT CAG GCA GCC CCA GAT AGT AAA 288 
His Alo Leu Asn Leu Alo Phe Vol Alo Gtn Alo Alo FW) A^ Ser Lys 
85 90 95 



WSG AAA'CTC CAA Wtt CTA GAG GGA TTT TGC TGG AAT GAA TAC CAG TCA 
Arg Lys Leu Gin Lys Leu Glu Gly Phe Cys Trp Asn Glu Tyr Gin Ser 



too 



105 



GCT TTT AGA GAT AGC CTA AAA GGT TTT 
Alo Phe Arg Asp Ser Leu Lys G^^ Phe 



336 



363 
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Description 

COMPOSITIONS AND METHODS FOR THE TREATMENT 
AND DIAGNOSIS OF CANCER 

5 

Technical Field 

The present invention relates generally to the detection and therapy of 
cancer. The invention is more specifically related to nucleotide sequences that are 
preferentially expressed in a tumor tissue and to polypeptides encoded by such 

10 nucleotide sequences. The invention is more particularly related to nucleotide 
sequences comprising at least a portion of a human endogenous retroviral sequence that 
is preferentially expressed in a tumor tissue, and to polypeptides encoded by such 
nucleotide sequences. The nucleotide sequences and polypeptides may be used in 
vaccines and pharmaceutical compositions for the prevention and treatment of cancer. 

15 The polypeptides may also be used for the production of compounds, such as 
antibodies, useful for diagnosing and monitoring the progression of cancer in a patient. 

Background of the Invention 

In recent years, considerable research has been directed to the 

20 identification of tumor markers, which may be usefixl for the diagnosis of particular 
cancers, for predicting the outcome of the disease or for developing a therapy in a 
patient-specific manner. Such research has generally focused on oncogenes, which arc 
normal cellular genes whose expression has been altered (e.g., by gene amplification, 
increased transcription, alteration of mRNA splicing or mutation within the coding 

25 region) such that otherwise normal cells assume neoplastic growth behavior. To date, 
however, the established markers have had a limited utility, and their use often leads to 
a result that is difficult to interpret 

Management of cancer currently relies on a combination of early 
diagnosis and aggressive treatment, which may include one or more of a variety of 

30 treatments such as surgery, radiotherapy, chemotherapy and hormone thempy. 
However, current diagnostic metfiods often fail to detect a cancer until the disease has 
progressed to a state that is difficult to treat, and existing treatments often have serious 
side effects. The high mortality observed among cancer patients indicates that 
improvements are needed in the diagnosis and treatment of the disease. 

^5 Accordingly, there is a need in the art for improved tumor markers, and 

methods for therapy and diagnosis of cancer. The present invention fulfills these needs 
and fiarther provides other related advantages. 
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Summary of the Invention 

Briefly stated, this invention provides compositions and methods for the 
diagnosis and therapy of cancer. In one aspect, isolated DNA molecules are provided, 
5 comprising: (a) a human endogenous retroviral sequence, wherein the retroviral 
sequence is preferentially expressed in a tumor tissue; (b) a variant of the human 
endogenous retroviral sequence that contains one or more nucleotide substitutions, 
deletions, insertions and/or modifications at no more than 20% (preferably no more 
than 5%) of the nucleotide positions, such that the antigenic and/or immunogenic 
10 properties of the polypeptide encoded by the human endogenous retroviral sequence are 
retained; or (c) a nucleotide sequence encoding an epitope of a polypeptide encoded by 
at least one of the above sequences. Isolated DNA and RNA molecules comprising a 
nucleotide sequence complementary to a DNA molecule as described above are also 
provided. 

15 In another aspect, the present invention provides an isolated DNA 

molecule encodmg an epitope of a polypeptide, the polypeptide being encoded by: 
(a) a nucleotide sequence transcribed from the sequence of SEQ ID N0:1 1; or (b) a 
variant of the nucleotide sequence that contains one or more nucleotide substitutions, 
deletions, insertions and/or modifications at not more than 20% of the nucleotide 

20 positions, such that the antigenic and/or inununogenic properties of the polypeptide 
encoded by the nucleotide sequence are retained Isolated DNA and RNA molecules 
comprising a nucleotide sequence complementary to a DNA molecule as described 

above are also provided. 

In related aspects, the present invention provides recombinant 
25 expression vectors comprising a DNA molecule as described above and host cells 
transformed or transfected with such expression vectors. 

In fiirther aspects, polypeptides, comprising an amino acid sequence 
encoded by a DNA molecule as described above, and monoclonal antibodies that bind 
to such polypeptides are provided. 
30 In another aspect, methods are provided for determining the presence of 

a cancer in a patient In one embodiment, the method comprises detecting, within a 
biological sample obtained from a patient, a polypeptide as described above. In another 
embodiment, the method comprises detecting, within a biological sample, an RNA 
molecule encodings polypeptide-as described-above^ln-yet anoth^^ 
35 method comprises (a) intradermally injecting a patient with a polypeptide as described 
above; and (b) detecting an immune response on the patient's skin and therefrom 
detecting the presence of a cancer in the patient. 
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In a related aspect, diagnostic kits useful in the determination of breast 
cancer are provided. The diagnostic kits generally comprise one or more monoclonal 
antibodies as described above, and a detection reagent. Within another related sispect, 
the diagnostic kit comprises a first polymerase chain reaction primer and a second 
5 polymerase chain reaction primer, the first and second primers each comprising at least 
about 10 contiguous nucleotides of an RNA molecule encoding a polypeptide as 
described above. Within yet another related aspect, the diagnostic kit comprises at least 
one oligonucleotide probe, the probe comprising jat least about 15 contiguous 
nucleotides of a DNA molecule as described above. In another aspect, the present 

10 invention provides methods for monitoring the progression of a cancer in a patient. In 
one embodiment, the method comprises: (a) detecting an amount, in a biological 
sample, of a polypeptide as described above; (b) subsequently repeating step (a); and 
(c) comparing the amounts of polypeptide detected in steps (a) and (b), and therefrom 
monitoring the progression of cancer in the patient. In another embodiment, the 

15 method comprises (a) detecting an amount, within a biological sample, of an RNA 
molecule encoding a polypeptide as described above; (b) subsequently repeating step 
(a); and (c) comparing the amounts of RNA molecules detected in steps (a) and (b), and 
therefrom monitoring the progression of cancer in the patient. 

In other aspects, pharmaceutical compositions, which comprise a 

20 polypeptide as described above and a physiologically acceptable carrier, and vaccines, 
which comprise a polypeptide as described above and an immune response enhancer 
are provided. 

In related aspects, the present invention provides methods for inhibiting 
the development of a cancer in a patient, comprising administering to a patient a 
25 pharmaceutical composition or vaccine as described above. 

These and other aspects of the present invention will become apparent 
upon reference to tfie following detailed description and attached drawings. All 
references disclosed herein are hereby incorporated by reference in their entirety as if 
each was incorporated individually. 

30 

Brief Description of the Drawings 

Figure 1 shows the differential display PGR products, separated by gel 
electrophoresis, obtained from cDNA prepared from normal breast tissue (lanes 1 and 
2) and from cDNA prepared from breast tumor tissue from th e same patient (lanes 3 
35 an^"4)rThe arrow indicates the band corresponding to BlSAgl. 

Figure 2 is a northern blot comparing the level of BlSAgl mRNA in 
breast tumor tissue (lane 1 ) with the level in normal breast tissue. 
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Figure 3 shows the level of BlSAgl mRNA in breast tumor tissue 
compared to that in various normal and non-breast tumor tissues as determined by 
RNase protection assays. 

Figure 4 is a genomic clone map showing the location of additional 
5 retroviral sequences (provided in SEQ ID NO:3 - SEQ ID NO: 1 0) relative to B 1 8Agl . 

Figures 5A and 58 show the sequencing strategy, genomic organization, 
and predicted open reading frame for the retroviral element containing Bl 8Ag I . 

Figure 6 shows the nucleotide sequence of the representative human 
endogenous retroviral element Bl 8Agl . 

10 

Detailed Description of the Invention 

As noted above, the present invention is generally directed to 
compositions and methods for the diagnosis, monitoring and therapy of cancer. The 
compositions described herein include polypeptides, nucleic acid sequences and 

15 antibodies. Polypeptides of the present invention generally comprise at least a portion 
of a protein that is encoded by a human endogenous retroviral sequence, wherein the 
human endogenous retroviral sequence is expressed at substantially greater levels in a 
human tumor tissue than in normal tissue (i.e.. the level of RNA encoding the 
polypeptide is at least two fold higher, and preferably at least five fold higher, in a 

20 tumor tissue than in normal tissue). Such sequences are said to be "preferentially 
expressed" in a tumor tissue. Any cancer characterized by increased expression of a 
human endogenous retroviral sequence within a tumor may be detected and/or treated 
according to the present invention. Representative cancers include breast cancer, 
prostate cancer, leukemia, lymphoma and Kaposi's sarcoma. As used herein, the term 

25 "polypeptide" encompasses amino acid chains of any length, including full length 
proteins (and epitopes thereoQ encoded by a human endogenous retroviral sequence. 

Nucleic acid sequences of the subject invention generally comprise a 
DNA or RNA sequence that encodes a polypeptide as described above, or that is 
complementary to such a sequence. Antibodies are generally immune system proteins, 

30 or fragments thereof, that are capable of binding to a portion of a polypeptide as 
described above. Antibodies can be produced by cell culture techniques, including the 
generation of monoclonal antibodies as described herein, or via transfection of antibody 
genes into suitable bacterial or mammalian cell hosts, in order to allow for the 

proxluctiQnjof_i»combinart_airtLbpdies_,,^ _ 

35 Polypeptides within the scope of this invention include, but are not 

limited to, polypeptides (and epitopes thereof) encoded by the human endogenous 
retroviral sequences described herein. Such sequences include the sequence designated 
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BlSAgl (SEQ ID N0;1) as well as other sequences such as those recited in SEQ ID 
N0:3-SEQ ID NO: 10, found within the retroviral genome containing BlSAgl (SEQ ID 
NO:l 1). BlSAgl has homology to the P30 gene of the endogenous human retroviral 
element S71, as described in Wemer et al., Virology 77^:225-238 (1990). As used 
5 herein, the term "polypeptide" encompasses amino acid chains of any length, including 
full length proteins encoded by a human endogenous retroviral element. A polypeptide 
comprising an epitope of a human endogenous retroviral element may consist entirely 
of the epitope, or may contain additional sequences. The additional sequences may be 
derived from the native protein or may be heterologous, and such sequences may (but 
1 0 need not) possess immxmogenic or antigenic properties. 

An "epitope," as used herein is a portion of a polypeptide that is 
recognized (i.e., specifically bound) by a B-cell and/or T-cell surface antigen receptor. 
Epitopes may generally be identified using well known techniques, such as those 
summarized in Paul, Fundamental Immunology, 3rd ed., 243-247 (Raven Press, 1993) 
15 and references cited therein. Such techniques include screening polypeptides derived 
from the native polypeptide for the ability to react with antigen-specific antisera and/or 
T-cell lines or clones. An epitope of a polypeptide is a portion that reacts with such 
antisera and/or T-cells at a level that is similar to the reactivity of the full length 
polypeptide (e.g., in an ELISA and/or T-cell reactivity assay). Such screens may 
20 generally be performed using methods well knovm to those of ordinary skill in the art, 
such as those described in Harlow and Lane, Antibodies: A Laboratory Manual, Cold 
Spring Harbor Laboratory, 1988. B-cell and T-cell epitopes may also be predicted via 
computer analysis. Polypeptides comprising an epitope of a polypeptide that is 
preferentially expressed in a tumor tissue (with or without additional amino acid 
25 sequence) are within the scope of the present invention. 

The compositions and methods of the present invention also encompass 
variants of the above polypeptides and nucleic acid sequences encoding such 
polypeptides. A polypeptide "variant," as used herein, is a polypeptide that differs from 
the native polypeptide in substitutions and/or modifications such that the antigenic 
30 and/or immunogenic properties of the polypeptide are retained. Such variants may 
generally be identified by modifying one of the above polypeptide sequences and 
evaluating the reactivity of the modified polypeptide with antisera and/or T-cells as 
described above. Nucleic acid variants may contain one or more substitutions, 
deletions, insertions and/or modifications such that the antigenic and/or immunogenic 
35 propeftiesoT the encoded polypeptide are retained. One preferred variant of a human 
endogenous retroviral sequence, or an epitope thereof, is a variant that contains 
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nucleotide substitutions, deletions, insertions and/or modifications at no more tiian 20% 
of the nucleotide positions within the native polypeptide sequence. 

Preferably, a variant contains conservative substitutions. A 
"conservative substitution" is one in which an amino acid is substituted for another 

5 amino acid that has similar properties, such that one skilled in the art of peptide 
chemistry would expect the secondary structure and hydropathic nature of the 
polypeptide to be substantially unchanged. In general, the following groups of amino 
acids represent conservative changes: (l)ala, pro, gly, glu, asp, gin, asn, ser, thr; 
(2) cys, ser, tyr, thr; (3) val. ile, leu, met, ala, phe; (4) lys, arg, his; and (5) phe, tyr, trp, 

10 his. 

Variants may also (or alternatively) be modified by, for example, the 
deletion or addition of amino acids that have minimal influence on the immunogenic or 
antigenic properties, secondary structure and hydropathic nature of the polypeptide. 
For example, a polypeptide may be conjugated to a signal (or leader) sequence at the N- 

15 terminal end of the protein which co-translationally or post-translationally directs 
transfer of the protein. The polypeptide may also be conjugated to a linker or other 
sequence for ease of synthesis, purification or identification of the polypeptide (e.g.. 
poly-His), or to enhance binding of the polypeptide to a solid support. For example, a 
polypeptide may be conjugated to an immunoglobulin Fc region. 

20 Human endogenous retroviral sequences that are expressed at 

substantially greater levels in a human tumor tissue than in normal tissue may be 
prepared using any of several techniques. For example, the human endogenous 
retroviral sequence designated B18Agl (Figure 6 and SEQ ID NO:l) may be cloned on 
the basis of its breast tumor specific expression, using differential display PGR. This 

25 technique compares the amplified products from poly A+ or total RNA template 
prepared from normal and breast tumor tissue. cDNA may be prepared by reverse 
transcription of RNA using a (dT),jAG primer. Following amphfication using the 
primer CCTCAACCTC (SEQ ID N0:13), a band corresponding to an amplified 
product specific to the tumor RNA may be cut out from a silver stained gel and 

30 subcloned into a suitable vector (e.g.. the T-vector, Novagen, Madison, WI). 

Altematively, the B18Agl gene (or a portion thereof) may be amplified 
from human genomic DNA, or from breast tumor cDNA, via polymerase chain 
reaction. For tfiis approach. BlSAgl sequence-specific primers may be designed based 

__gn^tiy^qnfr«^PjYLV!dgd in SEP ID N0:1. and may be purcha sed or synt hesized. On e 

35 suitable primer pair for amplification from breast tumor cDNA is (5'ATG OCT ATT 
TTC GGG GGC TGA CA) (SEQ ID NO:14) and (5'CCG GTA TCT CCT CGT GGG 
TAT T) (SEQ ID NO: 1 5). An amplified portion of B 1 8 Ag 1 may then be used to isolate 
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the full length gene from a human genomic DNA library or from a breast tumor cDNA 
library, using well known techniques such as those described in Sambrook et al., 
Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, Cold 
Spring Harbor, NY (1989). Other sequences within the retroviral genome containing 
5 B18Agl, such as those recited in SEQ ID NO:3 - SEQ ID NO:10, may be similarly 
prepared by screemng human genomic libraries using BlSAgl -specific sequences as 
probes. 

Other human endogenous retroviral sequences that are expressed at 
substantially greater levels in a human tumor tissue than in normal tissue may be 
10 prepared using methods known to those of ordinary skill in the art. For example, such 
sequences may be identified using low stringency hybridization, followed by PCR to 
identify conserved motifs. The level of expression in tumor tissue may generally be 
evaluated using the methods described herein, such as PCR and Northern blot analysis. 

Recombinant polypeptides encoded by the DNA sequences described 
15 above may be readily prepared from the DNA sequences. For example, supematants 
fixjm suitable host/vector systems which secrete recombinant polypeptide into culture 
media may be first concentrated using a commercially available filter. Following 
concentration, the concentrate may be applied to a suitable purification matrix such as 
an affinity matrix or an ion exchange resin. Finally, one or more reverse phase HPLC 
20 steps can be employed to fiuther purify a recombinant polypeptide. 

In general, any of a variety of expression vectors known to those of 
ordinary skill in the art may be employed to express recombinant polypeptides of this 
invention. Expression may be achieved in any appropriate host cell that has been 
transformed or transfected with an expression vector containing a DNA molecule that 
25 encodes a recombinant polypeptide. Suitable host cells include prokaryotes, yeast and 
higher eukaiyotic cells. Preferably, the host cells employed are E. coli, yeast or a 
mammalian cell line such as COS or CHO. 

Such techniques may also be used to prepare polypeptides comprising 
epitopes or variants of the native polypeptides. For example, variants of a native 
30 polypeptide may generally be prepared using standard mutagenesis techniques, such as 
oligonucleotide-directed site-specific mutagenesis, and sections of the DNA sequence 
may be removed to permit preparation of truncated polypeptides. Portions and other 
variants having fewer than about 100 amino acids, and generally fewer than about 50 
amino acids, may also be generated by synthetic means, using techniq ues well kno wn_ 
35 to~tHose of ordinary skill in the art. For example, such polypeptides may be synthesized 
using any of the commercially available solid-phase techniques, such as the Merrifield 
solid-phase synthesis method, where amino acids are sequentially added to a growdng 



wo 97/25431 



PCTAJS97/00398 



amino acid chain. See Merrifield, J. Am. Chem. Soc. 55:2149-2146, 1963. Equipment 
for automated synthesis of polypeptides is commercially available from suppliers such 
as Applied BioSystems, Inc., Foster City, CA, and may be operated according to the 
manufacturer's instructions. 
5 In specific embodiments, polypeptides of the present invention 

encompass polypeptides encoded by a human endogenous retroviral sequence that is 
expressed at substantially greater levels in a human tumor tissue than in normal tissue 
(such as the sequence recited in SEQ ID N0:1), variants of such polypeptides that are 
encoded by DNA molecules containing one or more nucleotide substitutions, deletions, 
10 insertions and/or modifications at no more than 20% of the nucleotide positions, and 
epitopes of the above polypeptides. Polypeptides within the scope of the present 
invention also include polypeptides (and epitopes thereoO encoded by DNA sequences 
that hybridize to the above sequences under stringent conditions, wherein the DNA 
sequences are at least 80% identical in overall sequence to the sequence recited in SEQ 
15 ID NO: 1 , and wherein RNA corresponding to said nucleotide sequence is expressed at a 
greater level in human tumor tissue than in the corresponding normal tissue. As used 
herein, "stringent conditions" refers to prewashing in a solution of 6X SSC, 0.2% SDS; 
hybridizing overnight at 65°C in 6X SSC, 0.2% SDS; followed by washing twice at 65° 
C for 30 minutes each with IX SSC. 0.1% SDS, and then washing twice at 65''C for 30- 
20 60 minutes each with O.IX SSC, 0.1% SDS. DNA molecules according to the present 
invention include molecules that encode any of the above polypeptides. 

In another aspect of the present invention, antibodies are provided. Such 
antibodies may be prepared by any of a variety of techniques known to those of 
ordinary skill in the art. See. e.g.. Harlow and Lane. Antibodies: A Laboratory 
25 Manual, Cold Spring Harbor Laboratory, 1988. In one such technique, an immunogen 
comprising the polypeptide is initially injected into any of a wide variety of mammals 
(e.g.. mice, rats, rabbits, sheep or goats). In this step, the polypeptides of this invention 
may serve as the immunogen without modification. Alternatively, particularly for 
relatively short polypeptides, a superior immune response may be elicited if the 
30 polypeptide is joined to a carrier protein, such as bovine serum albumin or keyhole 
limpet hemocyanin. The immunogen is injected into the animal host, preferably 
according to a predetermined schedule incorporating one or more booster 
immunizations, and the animals are bled periodically. Polyclonal antibodies specific 
^or_the.ppJyp_eplide_may_Mn_bejnmfi^^ 



35 chromatography using the polypeptide coupled to a suitable solid support. 

Monoclonal antibodies specific for the antigenic polypeptide of interest 
may be prepared, for example, using the technique of Kohler and Milsteiti, Eur. J. 
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Immunol. 6:511-519, 1976, and improvements thereto. Briefly, these methods involve 
the preparation of immortal cell lines capable of producing antibodies having the 
desired specificity (/.e, reactivity with the polypeptide of interest). Such cell lines may 
be produced, for example, from spleen cells obtained from an animal immunized as 
5 described above. The spleen cells are then immortalized by, for example, fusion with a 
myeloma cell fusion partner, preferably one that is syngeneic with the immunized 
animal. A variety of fusion techniques may be employed. For example, the spleen 
cells and myeloma cells may be combined with a nonionic detergent for a few minutes 
and then plated at low density on a selective medium that supports the growth of hybrid 
10 cells, but not myeloma cells. A preferred selection technique uses HAT (hypoxanthine, 
aminopterin, thymidine) selection. After a sufficient time, usually about I to 2 weeks, 
colonies of hybrids are observed. Single colonies are selected and their culture 
supematants tested for binding activity against the polypeptide. Hybridomas having 
high reactivity and specificity are preferred. 
15 Monoclonal antibodies may be isolated fh)m the supematants of 

growing hybridoma colonies. In addition, various techniques may be employed to 
enhance the yield, such as injection of the hybridoma cell line into the peritoneal cavity 
of a suitable vertebrate host, such as a mouse. Monoclonal antibodies may then be 
harvested from the ascites fluid or the blood. Contaminants may be removed fiiom the 
20 antibodies by conventional techniques, such as chromatography, gel filtration, 
precipitation, and extraction. The polypeptides of this invention may be used in the 
purification process in, for example, an affinity chromatography step. 

Antibodies may be used, for example, in methods for detecting a cancer 
(such as breast cancer, prostate cancer, leukemia, lymphoma or Kaposi's sarcoma) in a 
25 patient. Such methods involve using one or more antibodies to detect the presence or 
absence of a polypeptide as described herein in a suitable biological sample. As used 
herein, suitable biological samples include tumor or normal tissue biopsy, mastectomy, 
blood, lymph node, serum and urine samples or other tissue, homogenate or extract 
thereof, obtained &om a patient. It will be evident to those of ordinary skill in the art 
30 that, following detection of a polypeptide within a non-biopsy sample, additional tumor 
markers may be employed to identify the particular type of cancer. 

There are a variety of assay formats known to those of ordinary skill in 
the art for using an antibody to detect polypeptide markers in a sample. See, e.g., 
Harlow and Lane, Antibodies: A Laboratory Manu al, Cold S prin g Harbor Laborat ory, 
35 1988. For example, the assay may be performed in a Western blot format, wherein a 
protein preparation from the biological sample is submitted to gel electrophoresis, 
transferred to a suitable membrane and allowed to react with antibody. The presence of 
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antibody on the membrane may then be detected using a suitable detection reagent, as 
described below. 

In another embodiment, the assay involves the use of an antibody 
immobilized on a solid support to bind to the polypeptide and remove it from the 

5 remainder of the sample. The bound polypeptide may then be detected using a second 
antibody that binds to the binding partner/polypeptide complex and contains a reporter 
group. Alternatively, a competitive assay may be utilized, in which a polypeptide is 
labeled with a reporter group and allowed to bind to the immobUized antibody after 
incubation of the antibody with the sample. The extent to which components of the 

1 0 sample inhibit the binding of the labeled polypeptide to the antibody is indicative of the 
reactivity of the sample with the immobilized antibody, and as a result is indicative of , 
the concentration of polypeptide in the sample. 

The solid support may be any material known to those of ordinary skill 
in the art to which the antibody may be attached. For example, the solid support may 

15 be a test well in a microliter plate or a nitrocellulose filter or other suitable membrane. 
Alternatively, the support may be a bead or disc, such as glass, fiberglass, latex or a 
plastic material such as polystyrene or polyvinylchloride. The support may also be a 
magnetic particle or a fiber optic sensor, such as those disclosed, for example, in U.S. 
Patent No. 5,359,681. 

20 The antibody may be immobilized on the solid support using a variety of 

techniques known to those in the art, which are amply described in the patent and 
scientific literature. In the context of the present invention, the term "immobilization" 
refers to both noncovalent association, such as adsoiption. and covalent attachment 
(which may be a direct linkage between the antigen and functional groups on the 
25 support or may be a linkage by way of a cross-linking agent). Immobilization by 
adsorption to a well in a microtiter plate or to a membrane is preferred. In such cases, 
adsorption may be achieved by contacting the antibody, in a suitable buffer, with tiie 
solid support for a suitable amount of time. The contact time varies with temperature, 
but is typically between about 1 hour and 1 day. In general, contacting a well of a 
30 plastic microtiter plate (such as polystyrene or polyvinylchloride) witfi an amount of 
antibody ranging from about 10 ng to about 1 jig, and preferably about 100-200 ng, is 
sufficient to immobilize an adequate amount of polypeptide. 

Covalent attachment of antibody to a solid support may generally be 
achieved by first reacting tiie su pport with a bifimctio nal reagenL_tiTat will react with 



35 botii the support and a fimctional group, such as a hydroxyl or amino group, on the 
antibody. For example, the antibody may be covalentiy attached to supports having an 
apprtjpriate polymer coating using benzoquinone or by condensation of an aldehyde 
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group on the support with an amine and an active hydrogen on the binding partner {see, 
e.g. , Pierce Inununotechnology Catalog and Handbook ( 1 99 1 ) at A 1 2-A 1 3 ). 

In certain embodiments for detection of polypeptide in a sample, the 
assay is a two-antibody sandwich assay. This assay may be performed by first 
5 contacting an antibody that has bewi immobilized on a solid support, commonly the 
well of a microtiter plate, with the biological sample, such that the polypeptide within 
the sample are allowed to bind to the immobilized antibody. Unbound sample is then 
removed from the immobilized polypeptide-antibody complexes and a second antibody 
(containing a reporter group) capable of binding to a different site on the polypeptide is 
0 added. The amount of second antibody that remains boimd to the solid support is then 
determined using a method appropriate for the specific reporter group. 

More specifically, once the antibody is immobilized on the support as 
described above, the remaining protein binding sites on the support are typically 
blocked. Any suitable blocking agent known to those of ordinary skill in the art, such 
5 as bovine serum albumin or Tween 20™ (Sigma Chemical Co., St. Louis, MO). The 
immobilized antibody is then incubated with the sample, and polypeptide is allowed to 
bind to the antibody. The sample may be diluted with a suitable diluent, such as 
phosphate-buffered saline (PBS) prior to incubation. In general, an appropriate contact 
time (i.e.y incubation time) is that period of time that is sufficient to detect the presence 
of polypeptide uithin a sample obtained from an individual with breast cancer. 
Preferably, the contact time is sufficient to achieve a level of binding that is at least 
95% of that achieved at equilibrium between bound and unbound polypeptide. Those 
of ordinary skill in the art will recognize that the time necessary to achieve equilibrium 
may be readily determined by assaying the level of binding that occurs over a period of 
time. At room temperature, an incubation time of about 30 minutes is generally 
sufficient. 

Unbound sample may then be removed by washing the solid support 
with an appropriate buffer, such as PBS containing 0.1% Tween 20™. The second 
antibody, which contains a reporter group, may then be added to the solid support. 
Prefened reporter groups include en^ones (such as horseradish peroxidase), substrates, 
cofactors, inhibitors, dyes, radionuclides, luminescent groups, fluorescent groups and 
biotin. The conjugation of antibody to reporter group may be achieved using standard 
methods known to those of ordinary skill in the art. 

The second antibody is then incubated with the im mobilized antibod y- 
polypeptide complex for an amount of time sufficient to detect the bound polypeptide, 
An appropriate amount of time may generally be determined by assaying the level of 
binding that occurs over a period of time. Unbound second antibody is then removed 
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and bound second antibody is detected using the reporter group. The method employed 
for detecting the reporter group depends upon the nature of the reporter group. For 
radioactive groups^ scintillation counting or autoradiographic methods are generally 
appropriate. Spectroscopic methods may be used to detect dyes, luminescent groups 
5 and fluorescent groups. Biotin may be detected using avidin, coupled to a different 
reporter group (commonly a radioactive or fluorescent group or an enzyme). Enzyme 
reporter groups may generally be detected by the addition of substrate (generally for a 
specific period of time), followed by spectroscopic or other analysis of the reaction 
products. 

10 To determine the presence or absence of a cancer, the signal detected 

from the reporter group that remains bound to the solid support is generally compared 
to a signal that corresponds to a predetennined cut-off value. In one preferred 
embodiment, the cut-off value is the average mean signal obtained when the 
immobilized antibody is incubated with samples from patients without cancer. In 

15 general, a sample generating a signal that is three standard deviations above the 
predetermined cut-off value may be considered positive for a cancer. In an alternate 
preferred embodiment, the cut-off value is determined using a Receiver Operator 
Curve, according to the method of Sackett et al.. Clinical Epidemiology: A Basic 
Science for Clinical Medicine, p. 106-7 (Little Brown and Co., 1985). Briefly, in this 

20 embodiment, the cut-off value may be determined from a plot of pairs of true positive 
rates (/.e., sensitivity) and false positive rates (100%-specificity) that correspond to 
each possible cut-off value for the diagnostic test result. The cut-off value on the plot 
that is the closest to the upper left-hand comer (/.e., the value that encloses the largest 
area) is the most accurate cut-off value, and a sample generating a signal that is higher 

25 than the cut-off value determined by this method may be considered positive. 
Alternatively, the cut-off value may be shifted to the left along the plot, to minimize the 
false positive rate, or to the right, to minimize the false negative rate. In general, a 
sample generating a signal that is higher than the cut-off value determined by this 
method is considered positive for a cancer. 

30 In a related embodiment, the assay is performed in a flow-through or 

strip test format, wherein the antibody is immobilized on a membrane, such as 
nitrocellulose. In the flow-through test, the polypeptide within the sample binds to the 
immobilized antibody as the sample passes through the membrane. A second, labeled 
antibody th en binds to the antibody -polypeptidj^om plex as a so lutio n containing the 

35 second antibody flows through the membrane. The detection of bound second antibody 
may then be performed as described above. In the strip test format, one end of the 
membrane to which antibody is bound is immersed in a solution containing the sample. 
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The sample migrates along the membrane through a region containing second antibody 
and to the area of immobilized antibody. Concentration of second antibody at the area 
of immobilized antibody indicates the presence of breast cancer. Typically, the 
concentration of second antibody at that site generates a pattern, such as a line, that can 
5 be read visually. The absence of such a pattern indicates a negative result. In general, 
the amount of antibody immobilized on the membrane is selected to generate a visually 
discernible pattern when the biological sample contains a level of polypeptide that 
would be sufficient to generate a positive signal in the two-antibody sandwich assay, in 
the format discussed above. Preferably, the amount of antibody immobilized on the 
10 membrane ranges from about 25 ng to about l^ig, and more preferably from about 50 
ng to about l^g. Such tests can typically be performed with a very small amount of 
biological sample. 

The presence or absence of a cancer in a patient may also be determined 
by evaluating the level of mRNA encoding a polypeptide of the present invention 

15 within the biological sample {e.g., a biopsy, mastectomy and/or blood sample from a 
patient) relative to a predetermined cut-off value. Such an evaluation may be achieved 
using any of a variety of methods known to those of ordinary skill in the art such as, for 
example, in situ hybridization and amplification by polymerase chain reaction. For 
example, polymerase chain reaction may be used to amplify sequences from cDNA 

20 prepared from RNA that is isolated from one of the above biological samples. 
Sequence-specific primers for use in such amplification may be designed based on a 
cDNA or genomic sequence, such as a sequence provided in SEQ ID NO: 1 or SEQ ID 
N0:3 - SEQ ID NO: 10, and may be purchased or synthesized. In the case of B 18Agl, 
as noted herein, one suitable primer pair is (5*ATG OCT ATT TTC GGG GGC TGA 

25 CA) (SEQ ID NO:14) and (5'CCG GTA TCT CCT CGT GGG TAT T) (SEQ ID 
NO: 15). The PGR reaction products may then be separated and visualized using gel 
electrophoresis, according to methods well known to those of ordinary skill in the art. 
Amplification is typically performed on samples obtained from matched pairs of tissue 
(tumor and non-tumor tissue from the same individual) or from unmatched pairs of 

30 tissue (tumor and non-tumor tissue from different individuals). The amplification 
reaction is preferably performed on several dilutions of cDNA spanning two orders of 
magnitude. A two-fold or greater increase in expression in several dilutions of the 
tumor sample as compared to the same dilution of the non-tumor sample is considered 
positive. 

35 Conventional RT-PCR protocols using agarose and ethidium bromide 

staining, while important in defining gene specificity do not lend themselves to 
diagnostic kit development because of the time and effort required in making them 
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quantitative (i.e., construction of saturation and/or titration curves), and their sample 
throughput. This problem is overcome by the development of procedures such as real 
time RT-PCR which allows for assays to be performed in single tubes, and in turn can 
be modified for use in 96 well plate formats. Instrumentation to perform such 

5 methodologies are available from ABI/Perkin Ekner. Alternatively, other high 
throughput assays using labelled probes (e.g., digoxygenin) in combination with 
labelled (e.g., enzyme fluorescent, radioactive) antibodies to such probes can also be 
used in the development of 96 well plate assays. 

In yet another method for determining the presence or absence of a 

10 cancer in a patient, one or more of the polypeptides described above may be used in a 
skin test. As used herein, a "skin test" is any assay performed directly on a patient in 
which a delayed-type hypersensitivity (DTH) reaction (such as swelling, reddening or 
dermatitis) is measured following intradermal injection of one or more polypeptides as 
described above. Such injection may be achieved using any suitable device sufficient 

15 to contact the polypeptide or polypeptides with dermal cells of the patient, such as a 
tuberculin syringe or 1 mL syringe. Preferably, the reaction is measured at least 48 
hours after injection, more preferably 48-72 hours. 

The DTH reaction is a cell-mediated immune response, which is greater 
in patients that have been exposed previously to a test antigen {i.e., an immunogenic 

20 portion of a polypeptide employed, or a variant thereof). The response may measured 
visually, using a ruler. In general, a response that is greater than about 0.5 cm in 
diameter, preferably greater than about 1.0 cm in diameter, is a positive response, 
indicative of a cancer. As noted above, additional tumor markers may be employed, 
using methods known to those of ordinary skill in the art, to identify the type of cancer 

25 present. 

The polypeptides of this invention are preferably formulated, for use in a 
skin test, as pharmaceutical compositions containing at least one polypeptide and a 
physiologically acceptable carrier, such as water, saline, alcohol, or a buffer. Such 
compositions typically contain one or more of the above polypeptides in an amount 

30 ranging from about 1 ^ig to 100 jig, preferably fix)m about 10 ^ig to 50 \xg in a volume 
of 0.1 mL. Preferably, the carrier employed in such pharmaceutical compositions is a 
saline solution with appropriate preservatives, such as phenol and/or Tween 80^". 

In other aspects of the present invention, the progression and/or response 
to treatment of a cancer m ay be monitoredTby^pjifennm over 

35 a period of time, and evaluating the change ui the level of the response (i.e. the amount 
of polypeptide or mRNA detected or, in the case of a skin test, the extent of the immune 
resp nse detected). For example, the assays may be performed every 1-2 months for a 
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period of 1-2 years. In general, a cancer is progressing in those patients in whom the 
level of the response increases over time. In contrast, a cancer is not progressing when 
the signal detected either remains constant or decreases with time. 

In further aspects of the present invention, the compounds described 
5 herein may be used for the immunotherapy of a cancer. In these aspects, the 
compounds (which may be polypeptides, antibodies or nucleic acid molecules) are 
preferably incorporated into pharmaceutical compositions or vaccines. Pharmaceutical 
compositions comprise one or more such compounds and a physiologically acceptable 
carrier. Vaccines may comprise one or more polypeptides and an inunune response 
10 enhancer, such as an adjuvant or a liposome (into which the compound is incorporated). 
Pharmaceutical compositions and vaccines may additionally contain a delivery system, 
such as biodegradable microspheres which are disclosed, for example, in U.S. Patent 
Nos. 4,897,268 and 5,075,109. Pharmaceutical compositions and vaccines within the 
scope of the present invention may also contain other compounds, including one or 
1 5 more separate polypeptides. 

Aliematively, a vaccine may contain DNA encoding one or more of the 
polypeptides as described above, such that the polypeptide is generated in situ. In such 
vaccines, the DNA may be present within any of a variety of delivery systems known to 
those of ordinary skill in the art, including nucleic acid expression systems, bacteria and 
viral expression systems. Appropriate nucleic acid expression systems contain the 
necessary DNA sequences for expression in the patient (such as a suitable promoter and 
terminating signal). Bacterial delivery systems involve the administration of a 
bacterium (such as Bacillus-Calmette-Guerriri) that expresses an immunogenic portion 
of the polypeptide on its cell surface. In a preferred embodiment, the DNA may be 
introduced using a viral expression system (e,g., vaccinia or other pox virus, retrovirus, 
or adenovirus), which may involve the use of a non-pathogenic (defective), replication 
competent virus. Techniques for incorporating DNA into such expression systems are 
well knoAvn to those of ordinary skill in the art. the DNA may also be "naked," as 
described, for example, in Ulmer et al., Science 259:1745-1749 (1993) and reviewed by 
Cohen, <Sc/em:c 25P:1691-1 692 (1993). The uptake of naked DNA may be increased 
by coating the DNA onto biodegradable beads, which are efficiently transported into 
the cells. 

While any suitable carrier known to those of ordinary skill in the art may 
be employed in the pharmaceutical compositions of this invention, the type of c arrier 
will vary depending on the mode of administration. For parenteral administration, such 
as subcutaneous injection, the carrier preferably comprises water, saline, alcohol, a fat, 
a wax or a buffer. For oral administration, any of the above carriers or a solid carrier, 
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such as mannitol, lactose, starch, magnesium stearate, sodium saccharine, talcum, 
cellulose, glucose, sucrose, and magnesium carbonate, may be employed. 
Biodegradable microspheres (e.g., polylactate polyglycolate) may also be employed as 
carriers for the pharmaceutical compositions of this invention. 
5 Any of a variety of adjuvants may be employed in the vaccines of this 

invention to nonspecifically enhance the immune response. Most adjuvants contain a 
substance designed to protect the antigen from rapid catabolism, such as aluminum 
hydroxide or mineral oil, and a nonspecific stimulator of inunune responses, such as 
lipid A, Bordello pertussis or Mycobacterium tuberculosis-dcnved proteins. Suitable 
10 adjuvants are commercially available as, for example, Freund's Incomplete Adjuvant 
and Complete Adjuvant (Difco Laboratories, Detroit, MI), Merck Adjuvant 65 (Merck 
and Company, Inc., Rahway, NJ), alum, biodegradable microspheres, monophosphoryl 
lipid A and quil A. Cytokines, such as GM-CSF or interleukin-2, -7, or -12, may also 
be used as adjuvants. 

15 The above pharmaceutical compositions and vaccines may be used, for 

example, for the therapy of cancer in a patient. As used herein, a "patient" refers to any 
warm-blooded animal, prcfoably a human. A patient may or may not be afflicted with 
a cancer. Accordingly, the above pharmaceutical compositions and vaccines may be 
used to prevent the development of a cancer or to treat a patient afflicted with a cancer . 

20 To prevent the development of a cancer, a pharmaceutical composition or vaccine 
comprising one or more polypeptides as described herein (or naked, plasmid or viral 
vector DNA encoding such a polypeptide) may be administered to a patient. For 
treating a patient with a cancer, the pharmaceutical composition or vaccine may 
comprise one or more polypeptides, antibodies or nucleic acid molecules 

25 complementary to DNA encoding a polypeptide as described herein {e.g., antisense 
RNA or antisense deoxyribonudeotide oligonucleotides). 

For example, tumor cells that express a polypeptide as described herein 
may be preferentially killed by administering to a patient a conjugate in which a 
cytotoxic agent or "prodrug" is linked to antisense RNA, an antisense 

30 deoxyribonudeotide oligonucleotide or an antibody that binds to such a polypeptide. 
As used herein, the term "prodmg" refers to a group that is not itself toxic to the cdl, 
but that can be rendered toxic after the conjugate is directed to the target cell by the 
addition of a second activating compound, such as an enzyme that can convert the 
prodrug into an active drug. Any s uitable cytotoxic agent (in cluding rad ionuclides) o r 

^^rodrag known to those of ordinary skill in the art may be employed in such methods. 
Suitable pix)drugs include boron, doxifluridine, or the prodrug precursor of palytoxin. 
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Routes and frequency of administration, as well as dosage, will vary 
from individual to individual. In general, the pharmaceutical compositions and 
vaccines may be administered by injection (eg., intracutaneous, intramuscular, 
intravenous or subcutaneous), intranasally (e.g., by aspiration) or orally. Between 1 
5 and 10 doses may be administered for a 52 week period. Preferably, 6 doses are 
administered, at intervals of one month, and booster vaccinations may be given 
periodically thereafter. Alternate protocols may be appropriate for individual patients. 
A suitable dose is an amount of a compound that, when administered as described 
above, is enable of promoting an anti-tumor inunune response. Such a response can 
1 0 be monitored by measuring the level of anti-tumor antibodies in a patient or by vaccine- 
dependent generation of cytolytic effector cells capable of killing the patient's tumor 
cells in vitro. A suitable dose should also be capable of causing an immune response 
that leads to an improved clinical outcome (e.g., more frequent remissions, complete or 
partial or longer disease-free survival) in vaccinated patients as compared to non- 
15 vaccinated patients.. In general, for pharmaceutical compositions and vaccines 
comprising one or more polypeptides, the amount of each polypeptide present in a dose 
ranges from about 100 fig to about 5 mg. Suitable dose sizes will vary with the size of 
the patient, but will typically range from about 0.1 mL to about 5 mL, 

20 The following Examples are offered by way of illustration and not by 

way of limitation. 
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EXAMPLES 



Example 1 

Preparation of BlSAcl cDNA and Genomic Clones T Isinp Differential Display RT- 

PCR 

This Example illustrates the preparation of cDNA and genomic DNA 
molecules encoding B 1 8Agl using a differential display screen. 

Tissue samples were prepared from breast tumor and normal tissue of a 
patient with breast cancer that was confirmed by pathology after removal from the 
patient. Normal RNA and tumor RNA was extracted from the samples and mRNA was 
isolated and converted into cDNA using a (dT),2AG anchored 3' primer. Differential 
display PGR was then executed using a randomly chosen primer (CTTCAACCTC) 
(SEQ ID NO: 16). Amplification conditions were standard buffer containing 1.5 inM 
MgCl,, 20pmol of primer, SOOpmol dNTP, and 1 unit of 7a? DNA polymerase 
(Pcrkin-Elmer, Branchburg, NJ). Forty cycles of amplification were performed using 
94°C denaturation for 30 seconds, 42°C annealing for 1 minute, and 72°C extension for 
30 seconds. An RNA fingerprint containing 76 amplified products was obtained. 
Although the RNA fingerprint of breast tumor tissue was over 98% identical to that of 
the normal breast tissue, a band was repeatedly observed to be specific to the RNA 
fingerprint pattern of the tumor. This band was cut out of a silver stained gel and 
subcloned into the T-vector (Novagen, Madison, WI) and sequenced. 

The sequence of the cDNA, referred to as B18Agl. is provided in SEQ 
ID N0:1. A database search of GENBANK and EMBL revealed that the BlSAgl 
ftagment initially cloned is 77% identical to the endogenous human retroviral element 
S71, which is a truncated reti^oviral element homologous to the Simian Sarcoma Vims 
(SSV). S71 contains a complete gag gene, a portion of the pol gene and an LTR-like 
structui* at the 3' terminus (see Werner et al.. Virology 774:225-238 (1990)). B18Agl 
is also 64% identical to SSV in the region corresponding to the P30 (gag) locus. 
BlSAgl contains three separate and incomplete reading frames covering a region which 
shares considerable homology to a wide variety of gag proteins of retroviruses which 
infect mammals. In addition, the homology to S71 is not just within the gag gene, but 

spans several kb of sequence including an LTR. 

^t8Agl-spec4fic~PCR-primers— were^synthesized using computer 



analysis guidelines. RT-PCR amplification (94»C, 30 seconds; 60»G 42»C, 30 
seconds; 72X, 30 seconds, for 40 cycles) confirmed that BlSAgl represents an actoial 
mRNA sequence present at relatively high levels in the patient's breast timior tissue. 
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The primers used in amplification were B18Agl-l (CTG CCT GAG CCA CAA ATG) 
(SEQ ID N0:17) and Bl 8AgI-4 (CCG GAG GAG GAA GCT AGA GGA ATA) (SEQ 
ID NO:18) at a 3,5 mM magnesium concentration and a pH of 8.5, and Bl8Agl-2 
(ATG GCT ATT TTC GGG GGC TGA CA) (SEQ ID NO: 14) and B18Agl-3 (CCG 
5 GTA TCT CCT CGT GGG TATT) (SEQ ID NO; 15) at 2 mM magnesium at pH 9.5. 
The same experiments showed exceedingly low to nonexistent levels of expression in 
this patient's normal breast tissue (see Figure 1). RT-PCR experiments were then used 
to show that B18Agl mRNA is present in nine other breast tumor samples (from 
Brazilian and American patients) but absent in, or at exceedingly low levels in, the 

10 normal breast tissue corresponding to each cancer patient. RT-PCR analysis has also 
shown that the BlSAgl transcript is not present in various nomial tissues (including 
lymph node, myocardium and liver) and present at relatively low levels in PBMC and 
lung tissue. The presence of BlSAgl mRNA in breast tumor samples, and its absence 
from normal breast tissue, has been confirmed by Northern blot analysis, as shown in 

15 Figured 

The differential expression of BlSAgl in breast tumor tissue was also 
confirmed by RNase protection assays. Figure 3 shows the level of BlSAgl mRNA in 
various tissue types as determined in four different RNase protection assays. Lanes 1- 
12 represent various normal breast tissue samples, lanes 13-25 represent various breast 

20 tumor samples; lanes 26-27 represent normal prostate samples; lanes 28-29 represent 
prostate tumor samples; lanes 30-32 represent colon tumor samples; lane 33 represents 
normal aorta; lane 34 represents normal small intestine; lane 35 represents nonnal skin, 
lane 36 represents nonnal lymph node; lane 37 represents normal ovary; lane 38 
represents normal liver; lane 39 represents normal skeletal muscle; lane 40 represents a 

25 first nonnal stomach sample, lane 41 represents a second normal stomach sample; lane 
42 represents a normal lung; lane 43 represents normal kidney; and lane 44 represents 
nonnal pancreas. Interexperimental comparison was facilitated by including a positive 
control RNA of known B-actin message abundance in each assay and normalizing the 
results of the different assays with respect to this positive control. 

30 RT-PCR and Southern blot analysis has shown the BlSAgl locus to be 

present in human genomic DNA as a single copy endogenous retroviral element. A 
genomic clone of approximately 12-18 kb was isolated using the initial BlSAgl 
sequence as a probe. Four additional subclones were also isolated by Xbal digestion. 
^ Additional retroviral sequences obtained from these clon es (located as shown i n Fi gure 

35 4) are shown as SEQ ID N0:3 - SEQ ID NO: 10, where SEQ ID NO:3 shows the 
location of the sequence labeled 10 in Figure 4, SEQ ID N0:4 shows the location of the 
sequence labeled 11-29, SEQ ID N0:5 shows the location of the sequence labeled 3, 
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SEQ ID N0:6 shows the location of the sequence labeled 6, SEQ ID N0:7 shows the 
location of the sequence labeled 12, SEQ ID N0:8 shows the location of the sequence 
labeled 13, SEQ ID NO:9 shows the location of the sequence labeled 14 and SEQ ID 
NO: 1 0 shows the location of the sequence labeled 1 1 -22. 

5 Subsequent studies demonstrated that the 12-18 kb genomic clone 

contains a retroviral element of about 7.75 kb, as shown in Figures 5A and 5B. The 
sequence of this retroviral element is shown in SEQ ID NO: 11. The numbered line at 
the top of Figure 5 A represents the sense strand sequence of the retroviral genomic 
clone. The box below this line shows the position of selected restriction sites. The 

10 arrows depict the different overlapping clones used to sequence the retroviral element. 
The direction of the arrow shows whether the single-pass subclone sequence 
corresponded to the sense or anti-sense strand. Figure 5B is a schematic diagram of the 
retroviral element containing BlSAgl depicting the organization of viral genes within 
the element. The open boxes correspond to predicted reading frames, starting with a 

15 methionine, found throughout the element. Each of the six likely reading frames is 
shown, as indicated to the left of the boxes, with frames 1-3 corresponding to those 
found on the sense strand. 

Using the cDNA of SEQ ID N0:1 as a probe, a longer cDNA was 
obtained (SEQ ID NO: 12) which contains minor nucleotide differences (less than 1%) 

20 compared to the genomic sequence shown in SEQ ID NO:l 1 . 

Example 2 

Preparation of BlSAel DNA from Hum an Genomic DNA 

25 This example illustrates the preparation of BlSAgl DNA by 

amplification from human genomic DNA. 

BlSAgl DNA may be prepared from 250 ng human genomic DNA 
using 20 pmol of BlSAgl specific primers, 500 pmol dNTPS and 1 unit of Tag DNA 
polymerase (Perkin Elmer, Branchburg, NJ) using the following amplification 

30 parameters: 94°C denaturing for 30 seconds, 30 second 60°C to 42*»C touchdown 
annealing in 2«»C increments every two cycles and 72*^0 extension for 30 seconds. The 
last increment (a 42^C annealing temperature) should cycle 25 times. Primers 
(BI8Agl-l, B18Agl-2, B18Agl-3 and BlSAgM) were selected using computer 
analy sis. Primers s ynthesized wet^. Primer pairs that ma y be used are 1+3, 1+4, 2+3, 

35 and 2+4. 

Following gel electrophoresis, the band corresponding to BlSAgl DNA 
may be excised and cloned into a suitable vector. 
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Example 3 

Preparation of BISAgl DNA from Breast Tumor cDNA 

5 This example illustrates the preparation of BlSAgl DNA by 

amplification from human breast tumor cDNA. 

First strand cDNA is synthesized from RNA prepared ftx)m human 
breast tumor tissue in a reaction mixture containing 500 ng poly A+ RNA, 200 pmol of 
the primer (T)12AG (/.e, TTT TTT TTT TTT AG) (SEQ ID NO: 19), IX first strand 

10 reverse transcriptase buffer, 61 mM DTT, 500 mmol dNTPs, and 1 unit AMV or 
MMLV reverse transcriptase (from any supplier, such as Gibco-BRL (Grand Island, 
NY)) in a final volume of 30 After first strand synthesis, the cDNA is diluted 
approximately 25 fold and 1 \x\ is used for amplification as described in Example 2. 
While some primer pairs can result in a heterogeneous population of transcripts, the 

15 primers B18Agl- 2 (5'ATG GCT ATT TTC GGG GGC TGA CA) (SEQ ID NO: 14) 
and B18A&1-3 (5*CCG GTA TCT CCT CGT GGG TAT T) (SEQ ID N0:15) yield a 
single 151 bp amplification product. 

Example 4 

20 Identification of B-cell and T-cell Epitopes of BlSAel 

This Example illustrates the identification of BlSAgl epitopes. 
The BlSAgl sequence can be screened using a variety of computer 
algorithms. To determine B-cell epitopes, the sequence can be screened for 

25 hydrophobicity and hydrophilicity values using the method of Hopp, Prog. Clin Biol 
Res. 7725:367-77 (1985) or, alternatively, Cease et al., 164 J. Exp. Med 1779-84 
(1986) or Spouge et al., J. Immunol /J&204-12 (1987). Additional Class II MHC 
(antibody or B-cell) epitopes can be predicted using programs such as AMPHI (e.^., 
Margalit et al.. J. Immunol 138:2213 (1987)) or the methods of Rothbard and Taylor 

30 (eg.. EMBO J. 7:93 (l9S«)y 

Once peptides (15-20 amino acids long) are identified using these 
techniques, individual peptides can be synthesized using automated peptide synthesis 
equipment (available from manufacturers such as Applied BioSystems, Inc., Foster 
City, CA) and techniques such as Merrifield s ynthesis. Following synthesis, the__ 

35 peptides can used to screen sera harvested fix)m either normal or breast cancer patients 
to determine whether patients with breast cancer possess antibodies reactive with the 
peptides. Presence of such antibodies in breast cancer patient would confirm the 
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immunogenicity of the specific B-cell epitope in question. The peptides can also be 
tested for their ability to generate a serologic or humoral immime in aninials (mice, rats, 
rabbits, chimps etc.) following immunization in vivo. Generation of a peptide-specific 
antiserum following such immunization fiirther confirms the immunogenicity of the 

5 specific B-cell epitope in question. 

To identify T-cell epitopes, the B 1 8Agl sequence can be screened using 
different computer algorithms which are usefiil in identifying 8-10 amino acid motifs 
within tiie B18Agl sequence which are capable of binding to HLA Class I MHC 
molecules, (see. e.g.. Rammensee et al., Immunogenetics 47:178-228 (1995)). 

1 0 Following synthesis such peptides can be tested for their ability to bind to class I MHC 
using standard binding assays {e.g., Sette et al., J. Immunol. 755:5586-92 (1994)) and 
more importantly can be tested for their ability to generate antigen reactive cytotoxic T- 
cells following in vitro stimulation of patient or normal peripheral mononuclear cells 
using, for example, the methods of Bakker et al.. Cancer Res. 55:5330-34 (1995); 

15 Visseren et al.. J. Immunol. 754:3991-98 (1995); Kawakami et al., J. Immunol. 
754:3961-68 (1995); and Kast et al., J. Immunol. 752:3904-12 (1994). Successfiil 
in vitro generation of T-cells capable of kiUing autologous (bearing the same class I 
MHC molecules) tumor cells following in vitro peptide stimulation further confirms the 
immunogenicity of the B18Agl antigen. Furthermore, such peptides may be used to 

20 generate murine peptide and BlSAgl reactive cytotoxic T-cells followmg in vivo 
immunization in mice rendoed transgenic for expression of a particular human MHC 
Class I haplotype (Vitiello et al., J. Exp. Med. 775:1007-15 (1991). 

A representative a list of predicted BlSAgl B-cell and T-cell epitopes, 
broken down according to predicted HLA Class I MHC binding antigen, is shown 

25 below: 

Predicted Th Motifs (B-c ell epitopes) 

SSGGRTFDDFHRYLLVGI (SEQ ID NO:20) 
QGAAQKPINLSKXIEW(yjHDE (SEQ ID N0:21) 
30 SPGVFLEHLQEAYRIYTPFDLSA (SEQ ID NO:22) 

Predicted HLA A2.1 Motife (T -cell epitopes) 
YLLVGIQGA (SEQ ID NO:23) 

GAAQia»INL(SEQID^O:24) 



35 NLSKXIEW (SEQ ID NO:25) 

EWQGHDES (SEQ ID NO:26) 
HLQEAYRIY (SEQ ID NO:27) 
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NLAFVAQAA (SEQ ID NO:28) 
FVAQAAPDS (SEQ ID NO:29) 

From the foregoing, it will be ^predated that, although specific 
embodiments of the invention have been described herein for the purpose of 
illustration, various modifications may be made without deviating from the spirit and 
scope of the invention. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Corlxa Corporation 

(ii) TITLE OF INVENTION: COMPOUNDS AND METHODS FOR THE TREATMENT 
AND DIAGNOSIS OF CANCER 

(iii) NUMBER OF SEQUENCES: 29 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SEED and BERRY LLP 

(B) STREET: 6300 Columbia Center. 701 Fifth Avenue 

(C) CITY: Seattle 

(D) STATE: Washington 

(E) COUNTRY: USA 

(F) ZIP: 98104-7092 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: PatentIn Release #1.0. Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: lO-JAN-1997 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 
(A) NAME: Maki , David J. 

(B)-REGI-STRA-PI0N-NUMBE^^-31.^92 

(C) REFERENCE/DXKET NUMBER: 210121. 41BPC 
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(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (206) 622-4900 

(B) TELEFAX: (206) 682-6031 



(2) INFORMATION FOR SEO ID N0:1: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 415 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(x1) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

TTGANTGTCA AAAACCHNT AGGCTATCTC TAAAAGCTGA CTGGTATTCA TTCCAGCAAA 60 

ATCCCTCTAG TTTTTGGAGT nCCTTTTAC TATCTGGGGC TGCCTGAGCC ACAAATGCCA 120 

AATTAAGAGC ATGGCTATTT TCGGGGGCTG ACAGGTCAAA AGGGGTGTAA ATCCGATAAG IBO 

CCTCCTGGAG GTGCTCTAAA AACACTCCTG 6TGACTCATC ATGCCCCT66 ACGACHCAA 240 

TCGNCHAGA CAAGTTTATA GGTTTCTGGG CAGTCCCTGA ATACCCACGA GGAGATACCG 300 

GTGGAAATCG TCAAAAGTTC TCCCTCCACT TGAGAAATTT GGGTCCCAAT TAGGTCCCAA 360 

HGGGTCTCT AATCACTATT CCTQAGCTT CCTCCTCCGG NQAnGGH GATGT 415 

(2) INFORMATION FOR SEQ ID NO: 2: ■ l_ 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 96 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEO ID N0:2: 

Trp Asp Pro Asn Phe Ser Ser Gly Gly Arg Thr Phe Asp Asp Phe His 
15 10 15 

Arg Tyr Leu Leu Val Gly He Gin Gly Ala Ala Gin Lys Pro He Asn 
20 25 30 

Leu Ser Lys Xaa He Glu Val Val Gin Gly His Asp Glu Ser Pro Gly 
35 40 45 

Val Phe Leu Glu His Leu Gin Glu Ala Tyr Arg He Tyr Thr Pro Phe 
50 55 60 

Asp Lys Ser Ala Pro Glu Asn Ser His Ala Leu Asn Leu Ala Phe Val 
65 70 75 80 

Ala 61 n Ala Ala Pro Asp Ser Lys Arg Lys Leu Gin Lys Leu Glu Gly 
85 90 95 



(2) INFORMATION FOR SEQ ID N0:3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1180 base pairs 
(-B)-TYPE-;-nuc-leic-acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 

NCNNNNNHA TGATTACGCC AAGCGNGCAA HAACCCTCA CTAAAGG6AA CAAAAGCTGG 60 

AGCTCCACCG CGGTGGCGGC CGCTAGAATC HCATACCCC GAACTCTTGG GAAAACTTTA 120 

ATCAGTCACC TACAGTCTAC CACCCAITTA GGA6GAGCAA AGCTACCTCA GCTCCTCCGG 180 

AGCCGTTTTA A6ATCCCCCA TCTTCAAAGC CTAACAGATC AAGCAGCTCT CCGGTGCACA 240 

ACCTGCGCCC AGGTAAATGC CAAAAAAGGT CCTAAACCCA GCCCAGGCCA CCGTCTCCAA 300 

GAAAACTCAC CAGGAGAAAA GTGGGAAATT GACTTTACAG AAGTAAAACC ACACCGGGCT 360 

GGGTACAAAT ACCTTCTAGT ACTGGTAGAC ACCTTCTCTG GATGGACTGA AGCATTTGCT 420 

ACCAAAAACG AAACTGTCAA TATGGTAGTT AAGnTTTAC TCAATGAAAT CATCCCTCGA 480 

CGTGGGCTGC CTGTTGCCAT AGGGTCTGAT AATGGAACGG CCTTCGCCn GTCTATAGTT 540 

TAATCAGTCA GTAAGGCGTT AAACATTCAA TGGAAGCTCC ATTGTGCCTA TCGACCCAGA 600 

GCTCTGGGAA GTAGAACGCA TGAACTGCAC CCTAAAAAAA CACTCHACA AAATTAATCT 660 

TAAAAACCGG TGTTAA7TGT GTTAGTCTCC TTCCCnAGC CCTACTTAGA GHAAGGTGC 720 

ACCCCTTACT GGGCTGGGTT CTTTACCT7T TGAAATCATN TTTNGGAAGG GGCTGCCTAT 780 

CTTTNCnAA CTAAAAAANG CCCATTTG6C AAAAATTTCN CAACTAATTT NTACGTNCCT 840 



ACGTCTCCCC AACAGGTANA AAAATCTNCT GCCCT7TTCA AGGAACCATC CCATCCATTC 



900 
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CTNAACAAAA GGCCTGCCNT TCTTCCCCCA GTTAACTNTT TTTTNnAAA AHGCCAAAA 960 

AANGMCCNC CTGCTGGAAA AACNCCCCCC TCCMNCCCC GGCCNAAGNG GAAGGHCCC 1020 

TTGAATCCCN CCCCCNCNAA NGGCCCGGAA CCNHAAANT NGTTCCNGGG GGTNNGGCCT 1080 

AAAAGNCCNA nTGGTAAAC CTANAAATTT TrTCTTTTNT AAAAACCACN NTTTNNTTTT 1140 

TCTTAAACAA AACCCTNTTT NTAGNANCNT ATTTCCCNCC 1180 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1163 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

TNCTTTGATA CCCNAGCGH CAAHAACCC TCACTAAAGG GAACAAAAGC TGGA6CTCCA 60 

CCGCGGTGGC GGCCGCTCTA GAGCTGCGCC TGGATCCC6C CACAGTGAGG AGACCTGAAG 120 

ACCAGAGAAA ACACAGCAAG TAGGCCCTTT AAACTACTCA CCTGTGTT6T CnCTAATTT 180 

ATTCTGTTTT ATTTTGTTTC CATCATTTTA AGGGGHAAA ATCATCHGT TCAGACCTCA 240 

GCATATAAAA-TGACCCATCT-GTAGACCTCA^GGCICCAACC-ATACCCCAA G AGTTGTCTGG 300^, 

TTTTGITTAA AHACTGCCA GGTTTCAGCT GCAGATATCC CTGGAAGGAA TATTCCAGAT 360 
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TCCCT6AGTA GfTTCCAGGT TAAAATCCTA TAGGCHCn CTGTTTTGAG GAAGAGTTCC 420 

TGTCAGAGAA AAACATGAH TTG6ATTTTT AACTTTAATG CHGTGAAAC GCTATAAAAA 480 

AAATTTTCTA CCCCTAGCTT TAAAGTACTG TTAGT6AGAA AHAAAATTC CTTCAGGAGG 540 

ATTAAACT6C CATTTCAGTT ACCCTAAHC CAAATGTTTT GGTGGTTAGA ATCTTCTTTA 600 

ATGTTCnGA AGAAGTGTTT TATATTTTCC CATCNAGATA AATTCTCTCN CNCCTTNNTT 660 

TTNTNTCTNN 1 1 1 1 1 lAAAA CGGANTCTTG CTCCGHGTC CANGCTGGGA ATTTTNTTTT 720 

GGCCAATCTC CGCTNCCTTG CAANAATNCT GCNTCCCAAA ATTACCNCCT TTTTCCCACC 780 

TCCACCCCNN GGAAHACCT GGAAHANAG GCCCCCNCCC CCCCCCCGGC TAATTTGITT 840 

TTGTTTTTAG TAAAAAACGG GTTTCCTGTT HAGHAGGA TGGCCCANNT CTGACCCCNT 900 

NATCNTCCCC CTCNGCCCTC NAATNTTNGG NNTANG6C1T ACCCCCCCCN GNNGnTTTC 960 

CTCCATTNAA ATTTTCTNTG GANTCTTGAA TNNCGGGTTT TCCCTTTTAA ACCNATnTT 1020 

TmTNNNNC CCCCANTTTT NCCTCCCCCN TNTNTAANGG GGGTTTCCCA ANCCGGGTCC 1080 

NCCCCCANGT CCCCAATTTT TCTCCCCCCC CCTCTTTTTT CTTTNCCCCA AAANTCCTAT 1140 

CTTTTCCTNN AAATATCNAN TNT 1163 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1122 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 

NNGGTCCNNC TCAAAGTCAN TATAGGGCGA ATTGGGTACC 6GGCCCCCCC TCGAGGTCGA .60 

CGGTATCGAT AAGCHGATA TCGAATTCCT GCAGCCCGGG 6GATCCACTA GHCTAGACC 120 

AAGAAATGGA GGATTTTAGA GTGACTGATG ATTTCTCTAT CATCTGCAGT TAGTAAACAT 180 

TCTCCACAGT HATGCAAAA AGTAACAAAA CCACTGCA6A TGACAAACAC TAGGTAACAC 240 

ACATACTATC TCCCAAATAC CTACCCACAA GCTCAACAAT TTTAAACTGT TAG6ATCACT 300 

GGCTCTAATC ACCATGACAT GA6GTCACCA CCAAACCATC AAGCGCTAAA CAGACAGAAT 360 

GTTTCCACTC CTGATCCACT GTGTGGGAAG AAGCACCGAA CTTACCCACT GGGGGGCCTG 420 

CNTCANAANA AAAGCCCATG CCCCCGGGTN TNCCTITNAA CCGGAACGAA TNAACCCACC 480 

ATCCCCACAN CTCCTCTGTT CNTGG6CCCT GCATCHGTG GCCTCNTNTN CTTTNGGGGA 540 

NACNTGGGGA AGGTACCCCA TTTCNTTGAC CCCNCNANAA AACCCCNGT6 6CCGTTT6CC 600 

CTGAnCNCN TGGGCCTTTr CTCTnTCCC TTTTGGGnG TTTAAAnCC CAATGTCCCC 660 

NGAACCaCT CCNTNCTGCC CAAAACCTAC CTAAATTNCT CNCTANGNNT TTTCTTGGTG 720 

nNCTTTTCA AAGGTNACCT TNCCTGHCA NNCCCNACNA AAATTTNTTC CNTATNNTGG 780 

j^|>VTM|i|AAAA AMKiMATPMhir rrNAATTGCC CGAAHGGn NGGTT TTTCC TNC TGGGGGA 840 

AACCCTTTAA ATTTCCCCCT TGGCCGGCCC CCCTTTTnC CCCCCTTTNG AAGGCAGGNR . 900 
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GGTTCTTCCC GMCHCCAA TTNCAACAGC CNTGCCCATT GNTGAAACCC TTTTCCTAAA 960 

ATTAAAAAAT ANCCGGHNN G6NNGGCCTC TTTCCCCTCC NGGNGGGNNG N6AAANTCCT 1020 

TACCCCNAAA AAGGTTGCn AGCCCCCNGT CCCCACTCCC CCNGGAAAAA TNAACCTTTT 1080 

CNAAAAAAGG AATATAANTT TNCCACTCCT TNGTTCTCn CC 1122 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1091 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

NCNNNCCNH T6TNAAA6AC CGNCAGTGAG CGCGCGTAAT ACGACTCACT ATAGGGCGAA 60 

TTGGGTACCG GGCCCCCCCT CGAGGTCGAC GGTATCGATA AGCHGATAT CGAATTCCTG 120 

CAGCCCGGGG GATCCACTAG TTCTAGAGCT CGCGGCC6CG AGCTCTAATA CGACTCACTA 180 

TAGGGCGTCG ACTCGATCTC AGCTCACTGC AATCTCTGCC CCCGGGGTCA TGCGAHCTC 240 

CTGCCTCA6C CTTCCAAGTA GCTGGGATTA CAGGC6TGCA ACACCACACC CGGCTAATTT 30O 

TGTATTnTA ATAGAGATGG GGTTTTCCCT TGHGGCCAN NATGGTCTCN AACCCCTGAC 3 60 

CTCNNGTGAT CCCCCCNCCC NNGANCTCNN ACTGCTGGGG ATNNCCGNNN NNNNCCTCCC 420 
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NNCNCNNNNN NNCNCNNTCC NTNNTCCnN CTCNNNNNNN NCNNTCNNTC CNNCHCTCN 480 

CCNNNTNTTN TCNNCNNCCN MCNNNCCNCN TNCCCNCNNN TTCNCNTNCN NTNTCCNNCN 540 

NNNTCNNCNN NCNNNNCffTN NCCNNTACNT CNTNNNCNNN TCCNTCTNTN NCCTCNNCNN 600 

TCNCTNCNCN HNTCTCCTC NNTNNNNNNC TCCNNNNNTC TCNTCNCNNC NTNCCTCNNT 660 

NNCCNCNCCC CNCCTCNCNN CCTNNTTTNN NCNNCNNNTC CNTNCCNHC NNNTCCNNTN 720 

NCNNCNTCNC NNNCNHNH CCCNCCNNTT CCHNCNCNT MNNNTNTCNN NCNCNTCNNT 780 

CNTTTNCTCC TNNNTCCCNN CTCNNTTCNC CCNNNTCCNC CCCCCNCCTN TCTCTCNCCC 840 

NNNTNNNTNT NNNNCNTCCN CTNTCNCNTT CNTCNNTNCN TTNCTNTCNN CNNCNNTNCN 900 

CTNCCNTNTN TCTNNNTCNC NTCNCNTNTC NCCNTCCNH NCTNTCTCCT NTNTCCTTCC 960 

CCTCNCCTNC TCNTTCNCCN CCCNNTNTNT NTNNCNCCNN TNCTNNNCNN CCNTCNTTTC 1020 

NTCTCTNCTN NNNNTNNCCT CNNCCCNTNC CCTNNTNCNC TNCTNNTACC NTNCTNCTCC 1080 

NTCTTCCTTC C 1°^^ 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1165 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

NCNNNHATG ATTACGCCNA CGNNCAATTA ACCTCACTAA AGGGAACAAA AGCTGGAGCT 60 

CCACCGCGGT GGCGGCCGCT CTAGAGCTCG CGGCCGCGAG CTCAA1TAAC CCTCACTAAA 120 

GGGAGTCGAC TCGATCAGAC TGHACTGIG TCTATGTAGA AAGAAGTAGA CATAAGAGAT 180 

TCCATnTGT TCTGTACTAA GAAAAAHCT TCTGCCnGA GATGCTGTTA ATCTGTAACC 240 

CTA6CCCCAA CCCTGTGCTC ACAGAGACAT GTGCTGTGn GACTCAAGGT TCAATGGATT 300 

TAGGGCTATG ClTTGnAAA AAAGTGCTTG AAGATAATAT GCTTGTTAAA AGTCATCACC 360 

AHCTCTAAT CTCAAGTACC CAGGGACACA ATACACTGCG GAAGGCCGCA GGGACCTCTG 420 

TCTAGGAAAG CCAGGTAHG TCCAAGATTT CTCCCCATGT GATAGCCTGA GATATGGCCT 480 

CATGGGAAGG GTAAGACCTG ACTGTCCCCC AGCCCGACAT CCCCCAGCCC GACATCCCCC 540 

AGCCCGACAC CCGAAAAGGG TCTGTGCTGA GGAAGATTAN TAAAAGAGGA AGGCTCTTTG 600 

CATTGAA6TA AGAAGAAGGC TCTGTCTCCT GCTC6TCCCT GGGCAATAAA ATGTCTTGGT 660 

GTTAAACCCG AATGTATGTT CTACTTACTG AGAATAGGAG AAAACATCCT TAGGGCTGGA 720 

GGTGAGACAC CCTGGCGGCA TACTGCTCTT TAATGCACGA GATGTTTGTN TAATTGCCAT 780 

CCAGGGCCAN CCCCTTTCCT TAACTmTA TGANACAAAA ACTTTGTTCN CTTnCCTGC 840 

GAACCTCTCC CCCTAnANC CTATTGGCCT GCCCATCCCC TCCCCAAANG GTGAAAANAT 900 

6TTCNTAAAT NCGAGGGAAT CCAAAACNTT HCCCGnGG TCCCCTTTCC AACCCCGTCC 960 



CTGGGCCNNT HCCTCCCCA ACNTGTCCCG GNTCCTTCNT TCCCNCCCCC TTCCCNGANA 1020 
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AAAAACCCCG TNTGANGGNG CCCCCTCAAA TTATAACCn TCCNAAACAA ANNGGHCNA 1080 

AGGTGGTTTG NTTCCGGTGC 6GCTGGCCTT GAGGTCCCCC CTNCACCCCA ATTTGGAANC 1140 

CNGIIIIIII TATTGCCCNN TCCCC 1165 
(2) INFORMATION FOR SEQ ID N0:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1177 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 

NCCNTTTAGA TGTTGACAAN NTAAACAAGC NGCTCAGGCA GCTGAAAAAA GCCACTGATA 60 

AAGCATCCTG GAGTATCAGA GTTTACTGTT AGATCAGCCT CATTTGACTT CCCCTCCCAC 120 

ATGGTGTTTA AATCCAGCTA CACTACHCC TGACTCAAAC TCCACTATTC CTGTTCATGA 180 

CTGTCAGGAA CTGHGGAAA CTACT6AAAC TGGCCGACCT GATCTTCAAA ATGTGCCCCT 240 

AGGAAAGGTG GATGCCACCG TGHCACAGA CAGTACCNCC TTCCTCGAGA AGGGAQACG 300 

AGGGGCCG6T GCANCTGHA CCAAGGAGAC TNATGTGTT6 TGGGCTCAGG CTFTACCANC 360 

MACACCTCA NCN CNNAAGG CTGAATTGAT CGCCCTCACT CAGGCTCTCG GATGG6GTAA 420 

GGGATAHAA CGHAACACT GACAGCAGGT ACGCCTTT6C TACTGTGCAT GTACGTGGAG 480 
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CCATCTACCA GGAGCGT6GG CTACTCACTC GGCAGGTGGC TGTNATCCAC TGTAAANGGA 540 

CATCAAAAGG AAAACNNGGC TGHGCCCGT GGTAACCANA AANCTGATCN NCAGCTCNAA 600 

GATGCTGTGT TGACTTTCAC TCNCNCCTCT TAAACHGCT 6CCCACANTC TCCTTTCCCA 660 

ACCAGATCTG CCTGACAATC CCCATACTCA AAAAAAAAAN AANACTGGCC CCGAACCCNA 720 

ACCAATAAAA ACGGGGANGG TNGGTNGANC NNCCTGACCC AAAAATAATG GATCCCCCGG 780 

6CTGCAGGAA HCAATTCAN CCTTATCNAT ACCCCCAACN NGGNGG6GGG GGCCNGTNCC 840 

CA1TNCCCCT NTATTNAnC TTTNNCCCCC CCCCCGGCNT CCTTTTTNAA GTCGTGAAAG 900 

GGAAAACCTG NCTTACCAAN TTATCNCCTG GACCNTCCCC TTCCNCGGTN GNTTANAAAA 960 

AAAAGCCCNC ANTCCCNTCC NAAATTTGCA CNGAAAGGNA AGGAATTTAA CCTTTATTTT 1020 

nNNTCCTTT ANTTTGINNN CCCCCTTTTA CCCAGGCGAA CNGCCATCNT HAANAAAAA 1080 

AAANAGAANG nTATTTTTC CHNGAACCA TCCCAATANA AANCACCCGC NGGGGAACGG 1140 

GGNGGNAGGC CNCTCACCCC CTTTNTGTNG GNGGGNC 1177 
(2) INFORMATION FOR SEQ ID N0:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1146 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 

NCCNNTTNNT GATGHGTCT TTrTGGCCTC TCTTTGGATA CTTTCCCTCT CHCAGAGGT 60 

GAAAAGG6TC AAAAGGAGCT GTTGACA6TC ATCCCAGGTG GGCCAATGTG TCCAGAGTAC 120 

AGACTCCATC A6TGAGGTCA AAGCCTGG6G CTTTTCAGA6 AAGGGAGGAT TATGGGTTTT 180 

CCAATTATAC AAGTCAGAAG TAGAAAGAAG GGACATAAAC CAGGAAGGGG GTGGAGCACT 240 

CATCACCCAG AGGGACTTGT GCCTCTCTCA GTGGTAGTAG AGGGGCTACT TCCTCCCACC 300 

ACGGTTGCAA CCAAGAGGCA ATGGGTGATG AGCCTACAGG GGACATANCC GAGGAGACAT 360 

GGGATGACCC TAAGGGAGTA 6GCT6GTTTT AAGGCGGTGG GACTGQGTGA GGGAAACTCT 420 

CCTCTTCTTC AGAGAGAAGC AGTACAGGGC GAGCTGAACC GGCTGAAGGT CGAGGCGAAA 480 

ACACGGTCTG GCTCAG6AAG ACCHGGAAG TAAAATTATG AATGGTGCAT GAATGGAGCC 540 

ATGGAAGGGG TGCTCCTGAC CAAACTCAGC CAHGATCAA TGTTAGGGAA ACTGATCAGG 600 

GAAGCCGGGA ATTTCAnAA CAACCCGCCA CACAGCHGA ACAHGTGAG GHCAGTGAC 660 

CCnCAAGGG 6CCACTCCAC TCCAACTTTG 6CCATTCTAC TTTGCNAAAT HCCAAAACT 720 

TCCrmTTA AG6CCGAATC CNTANTCCCT NAAAAACNAA A^AAAATCTG CNCCTATTCT 780 

G6AAAAG6CC CANCCCTTAC CAGGCTGGAA GAAATTTTNC CTTTTTTTTT TTTTTGAAGG 840 

CNTTTNTTAA AHGAACCTN AAnCNCCCC CCCAAAAAAA AACCCNCCNG GG6GGC6GAT 900 

jjrsfsm/srjifsm.^^^-^ ArrA&AAAAr AAAAAnr.cNc cct tnttccc ttccnccctn ?60 

TTCTTTTAAT TAG6GAGAGA TNAAGCCCCC CAATTTCCNG GNCTNGATNN GTTTCCCCCC 1020 
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CCCCCATTTT CCNAAACTTT HCCCANCNA GGAANCCNCC CTTTTTTTNG GTCNGAHNA 1080 

NCAACCTTCC AAACCATTTT TCCNNAAAAA NTTTGNTNGG NG6GAAAAAN ACCTNNTTTT 1140 

ATAGAN 1146 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 545 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CTTCAnOGG TACGGGCCCC CTCGACCTCG ACGGTATCGA TAAGCTTGAT ATCGAAHCC 60 

TGCA6CCCGG GGGATCCACT AGTTCTAGAG TCAGGAAGAA CCACCAACCT TCCT6ATTTT 120 

TATTGGCTCT GAGTTCTGAG GCCAGT7TTC TTCTTCTGTT GAGTATGCGG GAHGTCAGG 180 

CAGATCTGGC T6TGGAAAGG AGACTGTGG6 CAGCAAGTn" AGAGGCGTGA CTGAAAGTCA 240 

CACTGCATCT TGAGCTGCTG AATCAGCTFT CTGGTTACCA CGGGCAACAG CCGTGTTTTC 300 

CTITTGATGT CCTTTACAGT GGAHACAGC CACCTGCTGA GGTGAGTAGC CCACGCTCCT 360 

GGTAGATGGC TCCACGTACA TGCACA6TAG CAAAGGCGTA CCTGCTGTCA GTGTTAACGT 420 



TAATATCCTT ACCCCATCGG AGAGCCTGAG TGAGGGCGAT CAAHCAGCC CTTTTGTGCT 480 
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GAGGTGITTG CIGGHAAGC CCTGAACCCA CAACACATCT GTCTCCATGG TAACAGCTGC 540 
ACCGG 545 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9388 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GCTCGCGGCC GCGAGCTCAA HAACCCTCA CTAAAGGGAG TCGACTCGAT CAGACTGTTA 60 

CTGTGTCTAT GTAGAAAGAA GTAGACATAA GAGAHCCAT TTTGTTCTGT ACTAAGAAAA 120 

AITCTTCTGC CTTGAGATGC TGTTAATCTG TAACCCTAGC CCCAACCCTG TGCTCACAGA 180 

GACATGTGCT GTGTTGACTC AAGGTTCAAT GGATTTAGGG CTATGCTTTG TTAAAAAAGT 240 

GCHGAAGAT AATATGCTTG TTAAAAGTCA TCACCAHCT CTAATCTCAA GTACCCAGG6 300 

ACACAATACA CTGCGGAAGG CCGCAGGGAC CTCTGTCTAG GAAAGCCAGG TATT6TCCAA 360 

GATTTCTCCC CAT6TGATAG CCTGAGATAT G6CCTCATGG GAAGGGTAAG ACCTGACTGT 420 

CCCCCAGCCC 6ACATCCCCC AGCCCGACAT CCCCCAGCCC GACACCCGAA AAGGGTCTGT 480 

GCTGAGGAGG ATTA6TAAAA GAGGAAGGCC TCTTTGCAGT TGAGGTAAGA GGAAGGCATC 540 



TGTCTCCTGC TCGTCCCTGG GCAATAGAAT GTCHGGTGT AAAACCCGAT TGTATGHCT 600 
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ACTTACTGA6 ATAGGAGAAA ACATCCTTA6 GGCTGGAGGT GAGACACGCT GGCGGCAATA 660 
CTGCTCTTTA ATGCACCGAG ATGTTTGTAT AA6TGCACAT CAAGGCACAG CACCTTTCCT 720 
TAAACTTATT TATGACACAG AGACCTTTGT TCACGTTTTC CTGCTGACCC TCTCCCCACT 780 
ATTACCCTAT TGGCCTGCCA CATCCCCCTC TCCGAGATGG TAGAGATAAT GATCAATAAA 840 
TACTGAGGGA ACTCAGAGAC CAGTGTCCCT GTAGGTCCTC CGTGTGCTGA GCGCCGGTCC 900 
CnGGGCTCA CTTTTCTTTC TCTATACTTT GTCTCTGTGT CTCnTCTTT TCTCAGTCTC 960 

TCGTTCCACC TGACGAGAAA TACCCACAGG TGTGGAGGGG CAGGCCACCC CTTCAATAAT 1020 

TTACTAGCCT GHCGCTGAC AACAAGACTG GTGGTGCAGA AGGnGGGTC TTGGTGTTCA 1080 

CCGGGT6GCA GGCATGG6CC AGGTGGGAGG GTCTCCAGCG CCTGGTGCAA ATCTCCAAGA 1140 

AAGTGCAGGA AACAGCACCA AGGGTGATTG TAAATTTTGA TTTGGCGCGG CAGGTAGCCA 1200 

TTCCAGCGCA AAAATGCGCA GGAAAGCTTT TGCTGTGCTT GTAGGCAGGT AGGCCCCAAG 1260 

CACnCTTAT TGGCTAATGT GGAGGGAACC TGCACATCCA HGCCTGAAA TCTCCGTCTA 1320 

TTTGAGGCTG ACTGAGCGCG TTCCTTTCn CTGTGTTGCC TGGAAACGGA CTGTCTGCCT 1380 

AGTAACATCT GATCACGHT CCCATTGGCC GCCGTTTCCG GAAGCCCGCC CTCCCATTTC 1440 

CGGAA6CCTG GCGCAAGGH GGTCTGCAGG TGGCCTCCAG GTGCAAAGT6 GGAAGTGTGA 1500 

GTCCTCAGTC nGGGCTATT CGGCCAC6TG CCTGCCGGAC ATGGGACGCT GGAGGGTCA6 1560 

CAGCGTGGAG TCCTGGCCn TTGCGTCCAC GGGTGGGAAA nGGCCATTG CCACGGCGGG 1620 

AACTGGGACT CAGGCTGCCC CCCGGCCGTT TCTCATCCGT CCACCGGACT CGTGGGCGCT 1680 
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CGCACTGGCG CTGATGTAGT TTCCTGACCT CTGACCCGTA TTGTCTCCAG AHAAAGGTA 1740 

AAAACGGGGC mTTCAGCC CACTCGGGTA AAACGCCTTT TGATTTCTAG GCAGGTGTn 1800 

TGTTGCACGC CTGGGA6GGA GTGACCCGCA GGnGAGGTT TAHAAAATA CAnCCTGGT 1860 

TTATGnATG TTTATAATAA AGCACCCCAA CCTTTACAAA ATCTCACTTT TTGCCAGTTG 1920 

TAnATTTAG TGGACTGTCT CTGATAAGGA CAGCCAG7TA AAATGGAAH TTGTTGnGC 1980 

TAAHAAACC AATTTTTAGT ITTGGTGTn GTCCTAATAG CAACAACTTC TCAGGCTTTA 2040 

TAAAACCATA TTTCTTGGGG GAAATTTCTG TGTAAGGCAC AGCGAGTTAG TnGGAATTG 2100 

TTTTAAAGGA AGTAAGTTCC TGGTTTTGAT ATCHAGTAG TGTAATGCCC AACCTGGTTT 2160 

TTACTAACCC T6TTTTTAGA CTCTCCCTTT CCHAAATCA CCTAGCCHG TTTCCACCTG 2220 

AAHGACTCT CCCHAGCTA AGAGCGCCAG ATGGACTCCA TCnGGCTCT nCACTGGCA 2280 

GCCCCTTCCT CAAGGACHA ACTTGTGCAA GCTGACTCCC AGCACATCCA AGAATGCAAT 2340 

TAACTGHAA GATACTGTGG CAAGCTATAT CCGCAGHCC GAGGAATTCA TCCGATT6AT 2400 

TATGCCCAAA AGCCCCGCGT CTATCACCTT 6TAATAATCT TAAAGCCCCT GCACCTGGAA 2460 

CTATTAACTT TCCTGTAACC ATTTATCCTT nAACTTTTT TGCTTACTTT ATTTCTGTAA 2520 

AATTGnTTA ACTAGACCTC CCCTCCCCTT TCTAAACCAA AGTATAAAAG AAGATCTAGC 2580 

CCCncnCA GAGCGGAGAG AATTTTGAGC AHAGCCATC ICnGGCGGC CAGCTAAATA 2640 

AATGGACTTT TAATTTGTCT CAAAGTGTGG CGTTTTCTCT AACTCGCTCA GGTACGACAT 2700 



TTGGAGGCCC CAGCGAGAAA CGTCACCGGG AGAAACGTCA CCGGGC6AGA GCCGGGCCCG 2760 
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CTGT6T6CTC CCCCGGAAGG ACAGCCAGCT TGTAGGGGGG AGTGCCACCT 6AAAAAAAAA 2820 

TTTCCAGGTC CCCAAAGGGT GACCGTCTTC CGGAGGACAG C6GATCGACT ACCATGCGGG 2880 

TGCCCACCAA AATTCCACCT CT6AGTCCTC AACT6CTGAC CCCGGGGTCA GGTAGGTCAG 2940 

ATTTGACTTT GGTTCTGGCA GAGGGAAGCG ACCCTGATGA 6G6TGTCCCT CTTTTGACTC 3000 

TGCCCATTTC TCTAGGATGC TAGAGGGTAG AGCCCIGGH nCTGTTAGA CGCCTCTGTG 3060 

TCTCTGTCTG GGAGGGAAGT GGCCCTGACA GG6GCCATCC CTTGAGTCAG TCCACATCCC 3120 

AGGATGCTG6 GGGACTGA6T CCTGGTTTCT GGCAGACT6G TCTCTCTCTC TCTCTTTTTC 3180 

TATCTCTAAT CTTTCCnGT TCAGGTTTCT TGGAGAATCT CTGGGAAAGA AAAAAGAAAA 3240 

ACTGTTATAA ACTCTGTGTG AAT6GTGAAT GAATGGGGGA GGACAAGGGC nGCGCTTGT 3300 

CCTCCAGTTT GTAGCTCCAC GGCGAAAGCT ACGGAGTTCA A6TGGGCCCT CACCT6CGGT 3360 

TCCGTGGCGA CCTCATAAGG CTTAAGGCAG CATCCGGCAT AGCTCGATCC GAGCCGGGGG 3420 

UTATACCGG CCTGTCAATG CTAAGAGGAG CCCAAGTCCC CTAAGGG6GA GCGGCCAGGC 3480 

G66CATCTGA CTGATCCCAT CACGGGACCC CCTCCCCTTG TITGTCTAAA AAAAAAAAAA 3540 

GAAGAAACTG TCATAACTGT HACATGCCC TAGGGTCAAC TGTTTGTTrT ATGTTTATTG 3600 

TTCTGnCGG TGTCTATT6T CTTGTTTAGT GGTTGTCAAG GTTTTGCATG TCAGGACGTC 3660 

GATATTGCCC AAGAC6TCT6 GGTAAGAACT TCTGCAAGGT CCnAGTGCT GATTTTTTGT 3720 

CACAGGAGGT TAAATTTCTC ATCAATCAH TAGGCTGGCC ACCACAGTCC TGTCTTTTCT 3780 

GCCAGAAGCA AGTCAGGTGT TGTTACGGGA ATGA6TGTAA AAAAACATTC GCCTGATTGG 3840 
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GATTTCTGGC ACCATGATGG nGTATTTAG ATTGTCATAC CCCACATCCA GGnGATTGG 3900 

ACCTCCTCTA AACTAAACTG GTGGT6GGTT CAAAACAGCC ACCCTGCAGA TTTCCTTGCT 3960 

CACCTCTTTG GTCATTCTGT AACTTTTCCT GT6CCCTTAA ATAGCACACT GTGTAGG6AA 4020 

ACCTACCCTC GTACTGCITT ACTTCGTnA GATTCTTACT CTGHCCTCT 6T6GCTACTC 4080 

TCCCATCTTA AAAACGATCC AAGTGGTCCT TTTCCTCCTC CCTGCCCCCT ACCCCACACA 4140 

TCTCGTnrC CAGTGCGACA GCAAGHCAG CGTCTCCAGG ACTTGGCTCT GCTCTCACTC 4200 

CTTGAACCCT TAAAAGAAAA AGCTGGGTrT GAGCTATTTG CCTTTGAGTC ATGGAGACAC 4260 

AAAAGGTATT TAGGGTACAG ATCTAGAA6A AGA6AGAGAA CACCTAGATC CAACTGACCC 4320 

AGGAGATCTC GGGCTGGCCT CTAGTCCTCC TCCCTCAATC HAAAGCTAC AGTGATGTGG 4380 

CAAGTGGTAT HAGCTGHG TGGTTTTTCT GCTCTTTCTG GTCATGTTGA nCTGTTCn 4440 

TCGATACTCC AGCCCCCCAG GGAGTGAGH TCTCTGTCT6 TGCTGGGTTT GATATCTATG 4500 

TTCAAATCTT ATTAAATTGC CHCAAAAAA AAAAAAAAAA GGGAAACACT TCCTCCCAGC 4560 

CHGTAAGGG TTGGAGCCCT CTCCAGTATA TGCTGCAGAA TTTTTCTCTC GGTTTCTCAG 4620 

AGGATTATGG AGTCCGCCH AAAAAAGGCA AGCTCTGGAC ACTCTGCAAA GTAGAATG6C 4680 

CAAAGTTTGG AGHGAGTCG CCCCHGAAG GGTCACTGAA CCTCACAATT GTTCAAGCTG 4740 

TGTGGCGGGT TGTTACTGAA ACTCCCGGCC TCCCTGATCA GTTTCCCTAC AHGATCAAT 4800 

GGCTGAGTTT GGTCAGGAGC ACCCCTTCCA TGGCTCCACT CATGCACCAT TCATAATTTT 4860 



ACCTCCAAGG TCCTCCTGAG CCAGACCGTG TTTTCGCCTC GACCCTCAGC CGGTTCAGCT 4920 
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CGCCCTGTAC TGCCTCTCTC TGAA6AAGAG GAGAGTCTCC CTCACCCAGT CCCACCGCCT 4980 

TAAAACCAGC CTACTCCCTT AGGGTCATCC CATGTCTCCT CGGCTATGTC CCCTGTAGGC 5040 

TCATCACCCA nGCCTCTTG GTTGCAACCG TGGTGG6AGG AAGTAGCCCC TCTACTACCA 5100 

CTGAGAGAGG CACAAGTCCC TCT6GGTGAT 6AGTGCTCCA CCCCCTTCCT GGTTTATGTC 5160 

CCnCTTTCT ACnCTGACT TGTATAATTG GAAAACCCAT AATCCTCCCT TCTCTGAAAA 5220 

GCCCCAGGCT HGACCTCAC TGATG6AGTC TGTACTCTGG ACACATTGGC CCACCTGGGA 5280 

TGACTGTCAA CAGCTCCTTT TGACCCTTTT CACCTCTGAA GAGAGGGAAA GTATCCAAAG 5340 

AGAG6CCAAA AAGTACAACC TCACATCAAC CAATAGGCCG GAGGAGGAAG CTAGAGGAAT 5400 

AGTGATTAGA 6ACCCAATTG GGACCTAATT GGGACCCAAA TTTCTCAAGT GGAGGGAGAA 5460 

CTTTTGACGA TTTCCACCGG TATCTCCTCG TGGGTAHCA GGGAGCTGCT CAGAAACCTA 5520 

TAAACTTGTC TAAGGCGACT GAAGTCGTCC AGGGGCATGA TGAGTCACCA GGAGTGTnrT 5580 

TAGAGCACCT CCAGGAGGCT TATCGGATTT ACACCCCTTT TGACCTGGCA GCCCCCGAAA 5640 

ATASCCATGC TCTTAATTTG GCATTTGTGG CTCAGGCAGC CCCAGATAGT AAAAGGAAAC 5700 

TCCAAAAACT AGAGGGAITT TGCTGGAATG AATACCAGTC AGCTTTTAGA GATAGCCTAA 5760 

AAGGTTTTTG ACAGTCAAGA GGTT6AAAAA CAAAAACAAG CAGCTCAGGC AGCTGAAAAA 5820 

AGCCACT6AT AAAGCATCCT GGAGTATCAG AGTrTACTGT TAGATCAGCC TCATTTGACT 5880 

TCCCCTCCCA CATGGTGTTT AAATCCAGCT ACAQACnC CTGACTCAAA CTCCACTAH 5940 

CaGnTCATG"~ACTGTCAGGA ACTGHGGAA ACTACTGAAA CTGGCCGACClGAfcnCAA 6000 
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AATGTGCCCC TAGGAAAGGT GGATGCCACC 
AAGGGACTAC GAAAGGCCGG TGCAGCTGH 
GCTTTACCAG CAAACACCTC AGCACAAAAG 
CGATGGGGTA AGGATATTAA CGHAACACT 
GTACGTGGAG CCATCTACCA GGAGCGTGGG 
CTGTAAAGGA CATCAAAAGG AAAACACGGC 
AGCAGCTCAA GATGCAGTGT GACTTTCAGT 
C1TTCCACAG CCAGATCT6C CTGACAATCC 
TCAGAACTCA GAGCCAATAA AAATCAGGAA 
TTCATACCCC GAACTCnGG GAAAACTTTA 
GGAGGAGCAA AGCTACCTCA GCTCCTCCGG 
CTAACAGATC AAGCAGCTCT CCGGTGCACA 
CCTAAACCCA GCCCAGGCCA CCGTCTCCAA 
GACTTTACAG AAGTAAAACC ACACCGGGCT 
ACCnCTCTG GATG6ACTGA AGCATTTGCT 
AAGimTAC TCAATGAAAT CATCCCTCGA 
AATGGACCGG CCTTCGCCTT GTCTATA6TT 
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GTGnCACAG ACAGTAGCAG CTTCCTCGAG 6060 

ACCATGGAGA CAGATGIGH GTGGGCTCAG 6120 

GCTGAAHGA TCGCCCTCAC TCAGGCTCTC 6180 

GACAGCAGGT ACGCCTTTGC TACT6TGCAT 6240 

CTACTCACCT CAGCAGGTGG CTGTAATCCA 6300 

TGHGCCCGT GGTAACCAGA AAGCTGATTC 6360 

CACGCCTCTA AACHGCTGC CCACAGTCTC 6420 

CGCATACTCA ACAGAAGAAG AAAACTGGCC 6480 

GGTTGGTGGA TTCnCCTGA CTCTAGAATC 6540 

ATCAGTCACC TACAGTCTAC CACCCATTTA 6600 

AGCCGTTrTA A6ATCCCCCA TCTTCAAAGC 6660 

ACCTGC6CCC AGGTAAATGC CAAAAAA66T 6720 

GAAAACTCAC CAGGAGAAAA GTGGGAAATT 6780 

GGGTACAAAT ACCTTCTAGT ACTGGTAGAC 6840 

ACCAAAAACG AAACTGTCAA TATGGTA6TT 6900 

CGTGGGCTGC CTGTTGCCAT AGGGTCTGAT 6960 

TAGTCAGTCA GTAAGGCGTT AAACAHCAA 7020 



TGGAAGCTCC AHGIGCCTA TCGACCCCAG AGCTCTGGGC AAGTAGAACG CATGAACTGC 7080 
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ACCCTAAAAA ACACTCTTAC AAAAHAATC TTAGAAACCG GTGTAAATTG TGTAAGTCTC 7140 
CTTCCTTTAG CCCTACHAG AGTAAGGTGC ACCCCHACT GGGCTGGGTT CTTACCTTTT 7200 
GAAATCATGT ATGGGAGGGC GCTGCCTATC HGCCTAAGC TAAGAGATGC CCAA7TGGCA 7260 

AAAATATCAC AAACTAATTT ATTACAGTAC CTACAGTCTC CCCAACAGGT ACAAGATATC 7320 

ATCCTGCCAC TTGTTCGAQG AACCCATCCC AATCCAATTC CTGAACAGAC AGGGCCCTGC 7380 

CAHCAnCC CGCCAGGTGA CCTGnGTTT GHAAAAAGT TCCAGAGAGA AGGACTCCCT 7440 

CCTGCnGGA AGAGACCTCA CACCGTCATC ACGATGCCAA CGGCTCTGAA GGTGGATG6C 7500 

ATTCCTGCGT G6ATTCATCA CTCCCGCATC AAAAAGGCCA ACGGAGCCCA ACTAGAAACA 7560 

TGG6TCCCCA GGGCTGGGTC AGGCCCCTTA AAACTGCACC TAAGnGGGT GAAGCCATTA 7620 

GATTAATTCT TTTTCTTAAT TTTGTAAAAC AATGCATAGC nCTGTCAAA CTTATGTATC 7680 

TTAAGACTCA ATATAACCCC CTTGnATAA CTGAGGAATC AATGATTTGA TTCCCCAAAA 7740 

ACACAAGTGG GGAATGTAGT GTCCAACCTG GTTTTTACTA ACCCTGTTn TAGACTCTCC 7800 

CTTTCC7TTA ATCACTCAGC CnGTTTCCA CCTGAATTGA CTCTCCCTTA GCTAAGAGCG 7860 

CCAGATGGAC TCCATCTTGG CTCTTTCACT GGCAGCCGCT TCCTCAAGGA CTTAACTTGT 7920 

GCAAGCTGAC TCCCAGCACA TCCAAGAATG CAATTAACTG ATAAGATACT GTGGCAAGCT 7980 

ATATCCGCAG TTCCCAGGAA TTCGTCCAAT TGATTACACC CAAAAGCCCC GCGTCTATCA 8040 

CC7TGTAATA ATCTTAAAGC CCCTGCACCT GGAACTATTA ACGHCCTGI AACCATTTAT 8100 

CCnTTAACT mTTGCCTA CTTTATTTCT GTAAAAHGT TTTAACTAGA CCCCCCCTCT 8160 
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CCTTTCTAAA CCAAAGTATA AAAGCAAATC TAGCCCCHC HCAGGCCGA GAGAATTTCG 8220 

AGCGTTAGCC GTCTCTTGGC CACCAGCTAA ATAAACGGAT TCTTCATGTG TCTCAAAGTG 8280 

TGGCGTnrC TCTAACTCGC TCAGGTACGA CCGTGGTAGT ATTTTCCCCA ACGTCTTATT 8340 

TTTAGGGCAC GTATGTAGAG TAACTTTTAT GAAAGAAACC AGHAAGGAG GnTTGGGAT 8400 

nCCTTTATC AACTGTAATA CIGGTITTGA nATTTATTT ATTTATTTAT miNiGAG 8460 

AAG6AGTTTC ACTCnGHG CCCAGGCTGG AGTGCAATGG TGCGATCTTG GCTCACTGCA 8520 

ACTTCCGCCT CCCAGGHCA AGCGATTCTC CTGCCTCAGC CTCGAGAGTA GCTGGGATTA 8580 

TAGGCATGCG CCACCACACC CAGCTAATTT TGTATTTTTA GTAAAGATGG GGTTTCTTCA 8640 

TGTTGGTCAA GCTGGTCTGG AACTCCCCGC CTCGGGTGAT CTGCCCGCCT CGGCCTCCGA 8700 

AAGTGCTGGG ATTACAGGTG TGATCCACCA CACCCAGCCG ATTTATATGT ATATAAATCA 8760 

CATTCCTCTA ACCAAAATGT AGTGTITCCT TCCATCTTGA ATATAGGCTG TAGACCCCGT 8820 

GGGTATGGGA CATTCnAAC AGTGAGACCA CAGCAGTTn TATGTCATCT GACAGCATCT 8880 

CCAAATAGCC TTCATGGTTG TCACTGCTTC CCAAGACAAT TCCAAATAAC ACnCCCAGT 8940 

GATGACHGC TACHGCTAT TGTTACTTAA TGTGTTAAGG TGGCTGTTAC AGACACTATT 9000 

AGTATGTCAG GAAHACACC AAAATTTAGT GGCTCAAACA ATCATTTTAT TATGTATGTG 9060 

GATTCTCATG GTCAGGTCAG GATTTCAGAC AGGGCACAAG GGTAGCCCAC nGTCTCTGT 9120 

CTATGATGTC TGGCCTCAGC ACAGGAGACT CAACAGCTGG GGTCTGGGAC CATTTGGAGG 9180 



CnGTTCCCT CACATCTGAT ACCTGGCTTG GGATGHGGA AGAGGGGGTG AGCTGAGACT 9240 
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GAGTGCCTAT ATGTAGTGTT TCCATATGGC CTTGACTTCC HACAGCCTG GCA6CCTCAG 9300 

GGTAGTCAGA ATTCnAGGA GGCACAGGGC TCCAGGGCAG ATGCTGAG6G GTCTiTTATG 9360 

AGGTAGCACA GCAAATCCAC CCAGGATC 9388 
(2) INFORMATION FOR SEQ ID NG:12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3646 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: 

GGGAAACACT TCCTCCCA6C CTTGTAAGGG TTGGAGCCCT CTCCAGTATA TGCTGCAGAA 60 

mTTCTCTC GGTTTCTCAG AGGAHATGG AGTCCGCCTT AAAAAAGGCA AGCTCTGGAC 120 

ACTCTGCAAA GTAGAATGGC CAAA6TTTGG AGTTGAGTGG CCCCTTGAAG GGTCACTGAA 180 

CCTCACAATT GTTCAAGCTG TGTGGCGGGT TGTTACTGAA ACTCCCGGCC TCCCTGATCA 240 

GTTTCCCTAC AHGATCAAT GGCTGAGTH 6GTCAGGAGC ACCCCHCCG TGGCTCCACT 300 

CATGCACCAT TCATAATTn ACCTCCAAGG TCCTCCTGAG CCA6ACCGTG TTTTCGCCTC 360 

GACCCTCAGC CGGHCGGCT CGCCCTGTAC T6CCTCTCTC TGAAGAA6AG GAGAGTCTCC 420 

CTCACCCAGT CCCACCGCCT TAAAACCAGC CTACTCCCH AGGGTCATCC CAT6TCTCCT 480 

CGGCTAT6TC CCCTGTAGGC TCATCACCCA TTGCCTCHG GTTGCAACCG TGGTGGGAGG 540 

AAGTAGCCCC TCTACTACCA CTGAGAGAGG CACAAGTCCC TCTGGGTGAT GAGTGCTCCA 600 - 
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CCCCCTTCCT GGTTTATGTC CCTTCTTTCT ACHCTGACT IGTATAAHG GAAAACCCAT 660 

AATCCTCCCT TCTCTGAAAA GCCCCAGGCT TTGACCTCAC TGATGGAGTC TGTACTCTGG 720 

ACACATTGGC CCACCTGGGA TGACTGTCAA CA6CTCCTTT TGACCCTTTT CACCTCTGAA 780 

GAGAGGGAAA GTATCCAAAG AGA6GCCAAA AA6TACAACC TCACATCAAC CAATAGGCCG 840 

GAGGAGGAAG CTAGAGGAAT AGTGAHAGA GACCCAATTG G6ACCTAATT GGGACCCAAA 900 

TTTCTCAAGT GGAG6GAGAA CTTTTGACGA TTTCCACC6G TATCTCCTCG TGGGTATTCA 960 

6GGAGCTGCT CAGAAACCTA TAAACTTGTC TAAGGCGACT GAAGTCGTCC AGGGGCATGA 1020 

TGAGTCACCA GGAGTGTITT TAGAGCACCT CCAGGAGGCT TATCAGATTT ACACCCCTTT 1080 

TGACCT6GCA GCCCCCGAAA ATAGCCATGC TCTTAATTTG GCATTTGTGG CTCAGGCAGC 1140 

CCCAGATAGT AAAAGGAAAC TCCAAAAACT AGAGGGATTT TGCTGGAATG AATACCAGTC 1200 

AGCTTTTAGA GATAGCCTAA AAGGTTTTTG ACAGTCAAGA GGHGAAAAA CAAAAACAAG 1260 

CAGCTCAGGC AGCTGAAAAA AGCCACTGAT AAAGCATCCT GGAGTATCAG AGTTTACTGT 1320 

TAGATCAGCC TCATTTGACT TCCCCTCCCA CATGGTGTTT AAATCCAGCT ACACTACTTC 1380 

CTGACTCAAA CTCCACTATT CCTGTTCATG ACTGTCAGGA ACTGTTGGAA ACTACT6AAA 1440 

CTG6CCGACC TGATCHCAA AATGTGCCCC TAGGAAAGGT GGATGCCACC ATGTTCACAG 1500 

ACAGTAGCAG CTTCCTCGAG AAGGGACTAC GAAAGGCCGG TGCAGCTGTT ACCATGGAGA 1560 

CAGATGTGn GTGGGCTCAG GCTTTACCAG CAAACACCTC AGCACAAAAG GCTGAAHGA 1620 



TCGCCCTCAC TCAGGCTCTC CGATGGGGTA AGGATAHAA CG7TAACACT GACAGCAGGT 1680 
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ACGCCTTTGC TACTGTGCAT GTACGTGGAG CCATCTACCA GGAGCGTGGG CTACTCACCT 1740 
CAGCA6GTGG CTGTAATCCA CTGTAAAG6A CATCAAAAGG AAAACACGGC TGHGCCCGT 1800 
GGTAACCAGA AAGCTGATTC AGCAGCTCAA 6ATGCAGTGT GACTTTCAGT CACGCCTCTA 1860 

AACTTGCTGC CCACAGTCTC CTTTCCACAG CCAGATCTGC CTGACAATCC CGCATACTCA 1920 

ACAGAAGAAG AAAACTGGCC TCAGAACTCA GAGCCAATAA AAATCAGGAA GGHGGTGGA 1980 

TTCTTCCTGA CTCTAGAATC TTCATACCCC GAACTCTTGG GAAAACTTTA ATCAGTCACC 2040 

TACAGTCTAC CACCCATTTA GGAGGAGCAA AGCTACCTCA GCTCCTCCGG AGCCGTTTTA 2100 

AGATCCCCCA TCTTCAAAGC CTAACAGATC AAGCAGCTCT CCGGTGCACA ACCTGCGCCC 2160 

AGGTAAATGC CAAAAAAGGT CCTAAACCCA GCCCAGGCCA CCGTCTCCAA GAAAACTCAC 2220 

CAGGAGAAAA GTGGGAAATT GACTTTACAG AAGTAAAACC ACACCGGGCT GGGTACAAAT 2280 

ACCnCTAGT ACTGGTAGAC ACCTTCTCTG GATGGACTGA AGCATTTGCT ACCAAAAACG 2340 

AAACTGTCAA TATG6TAGTT AAGTmTAC TCAATGAAAT CATCCCTCGA CATGGGCTGC 2400 

CTGITTGCCA TAG6GTCTGA TAAT6GACCG GCCTTCGCCT TGTCTATAGT TTAGTCAGTC 2460 

AGTAAGGCGT TAAACATTCA ATGGAAGCTC CAnOTGCCT ATCGACCCCA GAGCTCTG6G 2520 

CAAGTAGAAC 6CATGAACT6 CACCCTAAAA AACACTCTTA CAAAATTAAT CHAGAAACC 2580 

GGTGTAAATT GTGTAAGTCT CCTTCCTTTA GCCCTACHA GAGTAAGGTG CACCCCTTAC 2640 

TGGGCTGGGT TCnACCTTT TGAAATCAT6 TATGGGAGGG TGCTGCCTAT CTTGCCTAAG 2700 

CTAAGAGATGlXCAATTGGC AAAAATATCA CAAACTAAH TATTACAGTA CCTACAGTCT 2760 
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CCCCAACAGG TACAAGATAT CATCCTGCCA CHGITCGAG GAACCCATCC CAATCCAATT 2820 

CCTGAACAGA CAGGGCCCTG CCATTCATTC CCGCCAGGTG ACCTGTTGn T6TTAAAAAG 2880 

TTCCAGAGAG AAGGACTCCC TCCTGCnGG AAGAGACCTC ACACC6TCAT CACGATGCCA 2940 

ACGGCTCTGA AGGTGGATGG CATTCCTGCG IGGAHCATC ACTCCCGCAT- CAAAAAGGCC 3000 

AACAGAGCCC AACTAGAAAC ATGGGTCCCC AGGGCTGGGT CAGGCCCCH AAAACTGCAC 3060 

CTAAGTTGGG TGAAGCCAH AGATTAATTC TTTTTCnAA TnTGTAAAA CAATGCATAG 3120 

CTTCTGTCAA ACHATGIAT CHAAGACTC AATATAACCC CCnGHATA ACTGAGGAAT 3180 

CAATGATTTG ATTCCCCCAA AAACACAAGT GGGGAATGTA GTGTCCAACC TGGTmTAC 3240 

TAACCCTGTT TTTAGACTCT CCCTTTCCn TAATCACTCA GCnGTTTCC ACCTGAAHG 3300 

ACTCTCCCTT AGCTAAGAGC GCCAGATGGA CTCCATCTTG GCTCTTTCAC TGGCA6CCGC 3360 

TTCCTCAAGG ACHAACnG TGCAA6CTGA CTCCCAGCAC ATCCAAGAAT GCAAHAACT 3420 

GATAAGATAC TGTGGCAAGC TATATCCGCA GHCCCAGGA ATTCGTCCAA TTGATCACAG 3480 

CCCCTCTACC CnCAGCAAC CACCACCCTG ATCAGTCA6C AGCCATCAGC ACCGAGGCAA 3540 

G6CCCTCCAC CAGCAAAAAG ATTCTGACTC ACTGAAGACT TGGATGATCA TTAGTATTTT 3600 

TAGCAGTAAA GTTTTTTTTT CTTnTClTT CIIIIIIICT CGTGCC 3646 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 10 base p airs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 
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(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
CCTCAACCTC 
. (2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 



(Xl) SEQUENCE DESCRIPTION: SEQ ID N0:14: 
ATGGCTATrr TCGGGGGCTG ACA 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
CCGGTATCTC CTCGTG6GTA TT 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
CTTCAACCTC 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
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CTGCCTGAGC CACAAATG 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CCGGAGGAGG AAGCTAGAGG AATA 
(2) INFORMATION FOR SEQ ID NO: 19: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTW: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS: Single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
llllllllll HAG 

(2)-INFORMAT10N"FOR~SEQ~rDljOT20^ 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0;20: 

Ser Ser Gly Gly Arg Thr Phe Asp Asp Phe His Arg Tyr Leu Leu Val 
15 10 15 

Gly He 



(2) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Gin Gly Ala Ala Gin Lys Pro lie Asn Leu Ser Lys Xaa He Glu Val 

^ 5 10 ^5 

Val Gin Gly His Asp Glu 
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20 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:22: 

Ser Pro Gly Val Phe Leu Glu His Leu Gin Glu Ala Tyr Arg He Tyr 
15 10 15 

Thr Pro Phe Asp Leu Ser Ala 
20 

(2) INFORMATION FOR SEQ ID NO: 23: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:23: 
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Tyr Leu Leu Val Gly He Gin Gly Ala 
1 5 

(2) INFORMATION FOR SEQ ID N0:24: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino adds 

(B) TYPE: amino add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0;24: 

Gly Ala Ala Gin Lys Pro He Asn Leu 

1 5 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:25: 



Asn Leu Ser Lys Xaa He Glu Val Val 
1 5 
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(2) INFORMATrON FOR SEQ ID NO: 26: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:26 

Glu Val Val Gin Gly His Asp Glu Ser 
1 5 

(2) INFORMATION FOR SEQ ID N0:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids . 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:27: 

His Leu Gin Glu Ala Tyr Arg He Tyr 

1 5 



(2) INFORMATION FOR SEQ ID NO: 28: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(x1) SEQUENCE DESCRIPTION: SEQ ID NO: 28; 

Asn Leu Ala Phe Val Ala Gin Ala Ala 
1 5 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:29: 

Phe Val Ala Gin Ala Ala Pro Asp Ser 
1 5 
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Claims 

1 . An isolated DNA molecule, comprising: 

(a) a human endogenous retroviral sequence, wherein said retroviral 
sequence is preferentially expressed in a tumor tissue; 

(b) a variant of said human endogenous retroviral sequence that contains 
one or more nucleotide substitutions, deletions, insertions and/or modifications at no more 
than 20% of the nucleotide positions, such that the antigenic and/or immunogenic properties 
of the polypeptide encoded by the human endogenous retroviral sequence are retained; or 

(c) a nucleotide sequence encoding an epitope of a polypeptide encoded 
by at least one of the above sequences. 

2. An isolated DNA molecule encoding an epitope of a polypeptide, 
wherein said polypeptide is encoded by: 

(a) a nucleotide sequence transcribed from the sequence of SEQ ID 

NO:ll;or 

(b) a variant of said nucleotide sequence that contains one or more 
nucleotide substitutions, deletions, insertions and/or modifications at no more than 20% of 
the nucleotide positions, such that the antigenic and/or immunogenic properties of the 
polypeptide encoded by the nucleotide sequence are retained. 

3. A recombinant expression vector comprising a DNA molecule 
according to claim I or claim 2. 

4. A host cell transformed or transfected with an expression vector 
according to claim 3. 

5. A polypeptide comprising an amino acid sequence encoded by a DNA 
molecule according to claim 1 or claim 2. 

6. A monoclonal antibody that binds to a polypeptide according to 

claim 5. 

7. A method for determining the presence of a cancer in a patient 
comprising detecting, within a biological sample obtained 6om a patient, a polypeptide 
according to claim 5, and therefiiom determining the presence of cancer in the patient. 
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8. The method of claim 7 wherein the biological sample is a tumor 

sample. 

9. The method of claim 7 wherein the step of detecting comprises 
contacting the biological sample with a monoclonal antibody according to claim 6. 

10. The method of claim 7 wherein the polypeptide comprises an amino 
acid sequence encoded by a human endogenous retroviral sequence selected from the group 
consisting of SEQ ID NO:U SEQ ID NO:3 - SEQ ID NO:10 and SEQ ID N0;12. 

11. A method for determining the presence of a cancer in ia patient 
comprising detecting, within a biological sample obtained from a patient, an RNA molecule 
encoding a polypeptide according to claim 5, and therefrom determining the presence of 
cancer in the patient. 



sample. 



12. The method of claim 11 wherein the biological sample is a tumor 



13. The method of claim 1 1 wherein the step of detecting comprises: 

(a) preparing cDNA from RNA molecules within the biological sample; 



and 



(b) specifically amplifying cDNA molecules that are capable of encoding 
at least a portion of a polypeptide according to claim 5. 

14. The method of claim 1 1 v^iierein the polypeptide comprises an amino 
acid sequence encoded by a human endogenous retroviral sequence selected from the group 
consisting of SEQ ID N0:1, SEQ ID N0:3 - SEQ ID NO: 10 and SEQ ID NO: 12. 

15. A polypeptide according to claim 5 for use within a method for 
detecting the presence of a cancer in a patient 

16. The polypeptide of claim 15 wherein the polypeptide comprises an 
amino acid sequence encoded by a human endogenous retroviral sequence selected from the 
group consisting of SE P ID N0:1. SEQ ID NO:3 - SEQ ID NO:10 and SEQ ID NO:12. 



17. 

comprising: 



A method for monitoring the progression of a cancer in a patient, 
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(a) detecting an amount, in a biological sample obtained from a patient, of 
a polypeptide according to claim 5; 

(b) subsequently repeating step (a); and 

(c) comparing the amounts of polypeptide detected in steps (a) and (b), 
and therefrom monitoring the progression of cancer in the patient. 

18. The method of claim 17 wherein the biological sample is a tumor 

sample. 

19. The method of claim 17 wherein the step of detecting comprises 
contacting a portion of the biological sample with a monoclonal antibody according to 
claim 6. 

20. The method of claim 17 wherein the polypeptide comprises an amino 
acid sequence encoded by a human endogenous retroviral sequence selected from the group 
consisting of SEQ ID NO:l, SEQ ID NO:3 - SEQ ID NO: 10 and SEQ ID NO: 12. 

21. A method for monitoring the progression of a cancer in a patient, 

comprising: 

(a) detecting an amount, within a biological sample obtained frx)m a 
patient, of an RNA molecule encoding a polypeptide according to claim 5; 

(b) subsequently repeating step (a); and 

(c) comparing the amounts of RNA molecules detected in steps (a) and 
(b), and therefrom monitoring the progression of cancer in the patient. 

22. The method of claim 2 1 wherein the step of detecting comprises: 

(a) preparing cDNA from RNA molecules within the biological sample; 

and 

(b) specifically amplifying cDNA molecules that are capable of encoding 
at least a portion of a polypeptide according to claim S. 

23. The method of claim 21 wherein the polypeptide comprises an amino 
acid sequence encoded by a human endogenous retroviral sequence selected from the group 
consisting of SEQ ID N0:1, SEQ ID NO:3 - SEQ ID NO:10 and SEQ ID NO: 12. 



24. A pharmaceutical composiuon, comprising: 
(a) a polypeptide according to claim 5; and 
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(b) a physiologically acceptable carrier. 

25. A vaccine, comprising: 

(a) a pol)T)eptide according to claim 5; and 

(b) an immune response enhancer. 

26. A diagnostic kit comprising: 

(a) one or more monoclonal antibodies according to claim 6; and 

(b) a detection reagent 

27. The kit of claim 26 wherein the monoclonal antibody(s) are 
immobilized on a solid support. 

28. A diagnostic kit comprising a first polymerase chain reaction primer 
and a second polymerase chain reaction primer, the first and second primers each comprising 
at least about 10 contiguous nucleotides of an RNA molecule encoding a polypeptide 
according to claim S. 

29. A diagnostic kit comprising at least one oligonucleotide probe, the 
oligonucleotide probe comprising at least about 15 contiguous nucleotides of a DNA 
molecule according to claim 1 or claim 2. 



wo 97/25431 \ PCT/US97/00398 




B18Ag1 



Fig-t 



SUBSTITUTE SHEET (RULE 26) 



wo 97/25431 



2/6 



PCT/US97/00398 



E 



UJ 




SUBSTITUTE SHEET (RULE 26) 



wo 97/25431 



PCT/US97/00398 




Fig. 3 



SUBSTITUTE SHEET (RULE 26) 



wo 97/25431 



PCT/US97/00398 



4/6 



o 

X 



X 



o , 
X 



3 

o 
o 

o 

LU 

o 



o 

X 



I 



o 

X 



/II 



Q> 

o 



CNJ 

/II 

CNJ 

c 
o 



;ii 
o 



=1 1 



in 

a> 

c: 
o 



CO 

T 

CVl 
/II 

cr> 

c 
o 



O 
X 



SUBSTITUTE SHEET (RULE 26} 



wo 97/25431 



PCTAJS97/00398 



5/6 




SUBSTITUTE SHEET (RULE 26) 



wo 97/25431 



PCT/US97/00398 



6/6 

NUCLEOTIDE SEQUENCE OF THE REPRESENTATIVE 
BREAST-TUMOR SPECIFIC cDNA B18Ag1 



TTA GAG ACC CAA TTG GGA CCT AAT TGG GAC CCA AAT TTC TCA AGT GGA 48 
Leu Giu Thr Gin Leu Gly Pro Asn Trp Asp Pro Asn Phe Ser Ser Gly 
1 5 10 15 

GGG AGA ACT TTT GAC GAT TTC CAC CGG TAT CTC CTC GTG GGT ATT CAG % 
Gly Arg Thr Phe Asp Asp Phe His Ar^ Tyr Leu Leu Val G^g He Gin 

GGA GCT GCC CAG AAA CCT ATA AAC TTG TCT AAG GCG ATT GAA GTC GTC 144 
Gly Ala Ala Gin Lys Pro He Asn Leu Ser Lys Ala He Glu Val Val 
35 40 45 

CAG GGG CAT GAT GAG TCA CCA GGA GTG TTT TTA GAG CAC CTC CAG GAG 192 
Gin Gly His Asp Glu Ser Pro Gly Val Phe Leu Glu His Leu Gin Glu 
50 55 60 

GCT TAT CGG ATT TAC ACC CCT TTT GAC CTG GCA GCC CCC GAA AAT AGC 240 
Ala Tyr Arg He Tyr Thr Pro Phe Asp Leu Ala Ala Pro Glu Asn Ser 
65 70 75 80 

CAT GCT CTT AAT TTG GCA TTT GTG GCT CAG GCA GCC CCA GAT AGT AAA 288 
His Ala Leu Asn Leu Ala Phe Val Ala Gin Ala Alo Pro Asp Ser Lys 
85 90 95 

AGG AAA CTC CAA AAA CTA GAG GGA TTT TGC TGG AAT GAA TAC CAG TCA 336 
Arg Lys Leu Gin Lys Leu Glu Gly Phe Cys Trp Asn Glu Tyr Gin Ser 
100 105 110 



GCT TTT AGA GAT AGC CTA AAA GGT TTT 363 
An 
ll! 



Ala Phe Arg Asp Ser Leu Lys Gly Phe 

120 



Fig. 6 
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